Replicating SQL windowing functions in Power Query.
I’m sure you’ve all heard of windowing functions in SQL Server? But what if you need to replicate these windowing functions in Power Query?
Here I have a sample CarSales.csv dataset of car sales per brand, per year, per month and the number of units sold. I will use this dataset to demonstrate how we can replicate the windowing LAG function in Power Query.
In my example I’d like to calculate the month on month sales growth per car brand, per year. Using SQL this would be relatively simple using the LAG function to access the previous row value and then simply subtract the output of the LAG function from the column unit_sold with the use of a CTE.
Which would then generate the output below.
However, what if you found yourself in the situation where you couldn’t use SQL? Perhaps you’ve connected to an Analysis Cube or perhaps just a .CSV file in an Azure Storage Account. What do you do then…?
I’ll now show you how you use Power Query to help you solve your problem!
To start open Power BI Desktop and click “Home” > “Get Data” > “Text/CSV” and then select and open your .csv file. In my case it was CarSales.csv.
Now instead of just opening the data, you’ll want to click the option to “Transform Data” which will then open Power Query.
Once Power Query is open you’ll then need to sort the columns you want to partition by and order by. In my case it was “brand”, “year” and “month” which I sorted in ascending order. To do this click on the small down arrow to the right of each of the columns and click “Sort Ascending”.
This then produces the output/preview shown below.
The next step is to add an index column. To do this click on “Add Column” in the ribbon, click “Index Column” and then click “From 0”. This will then create a new column called “Index” which will increment by 1 starting from 0.
Once done you’ll then need to create a new column which will add 1 to this “Index” column. To do this click on the “Index” column to select it and then click on “Add Column” from the ribbon, then click “Standard” and finally click “Add”.
This will then launch a new window where you’ll be able to specify the value to add. For this example as we just want to access the previous row to get the previous months sales data we’ll enter the value 1. This step essentially replicates the [offset] argument in the LAG function. If instead you wanted to replicate the LEAD function you could instead use the “Subtract” option to subtract 1 away from this index column.
Once done, click “OK” and this will create a new column called “Addition”. To make things clearer for the purpose of this blog we’ll rename this to “Offset_Index”.
Next using the newly created “Index” and “Offset_Index” columns we want to left join this table back to itself. To do this click “Home” in the ribbon and then click “Merge Queries”.
This will then open a new window. You now need to select the 2 columns you want to partition by plus the index column. This will form the basis of your join criteria. To do this control + click the columns you want to join on. In my example these are “brand”, “year” and “index”.
Next in the empty drop down select the same table as the table you want to merge to. It will have “(Current)” tagged onto the end of the name. Now ensure that the join kind is set to “Left Outer”.
Next, control + click the 2 columns you want to partition in the new table plus the “Offset_Index” column we created earlier. In my case these were “brand”, “year” and “Offset_Index”.
You’ll now see the following join logic all joined by an “and” condition:
- “brand” from “CarSales” left joins to “brand” from “CarSales (Current)” indicated by the “1” next to the column heading each of the columns.
- “year” from “CarSales” left joins to “year” from “CarSales (Current)” indicated by the “2” next to the column heading each of the columns.
- “Index” from “CarSales” left joins to “Offset_Index” from “CarSales (Current)” indicated by the “3” next to the column heading each of the columns.
Next click OK and this will create a new column called “Renamed Columns” (which was just the name from the previous power query step) but essentially contains all of the columns from the newly joined (or merged) table.
Next expanded the “Renamed Columns” column by clicking the double-sided arrow and select the column you want to access from the previous row. In my case it was the “units_sold” column from the “CarSales (Current)” table.
Now you’ll want to rename this column to something that’s more meaningful to the context of your data. In my example I renamed this column to “units_sold_previous_month”.
Finally to solve the original problem we just needed to calculate the sales growth per month, per brand, per year. To do this click “Add Column” in the ribbon and then click “Custom Column”. When the new window opens call the new column something meaningful to your data and enter a custom formula to solve your problem. In my example I called this column “sales_growth” and entered a custom formula of [units_sold] – [units_sold_previous_month].
This then created a new column called “sales_growth”.
Finally the 3 newly created columns we created in the steps earlier can now be deleted. In my example these were “Index”, “Offset_Index” and “units_sold_previous_month”. To do this control + click the columns you want to remove, right click and then left click “remove columns”.
And that’s it! These steps have successfully replicated the LAG function in Power Query!
I hope this step by step guides helps your solve your windowing function needs in Power Query.