I’m sure you’ve all heard of windowing functions in SQL Server? But what if you need to replicate these windowing functions in Power Query?
Here I have a sample CarSales.csv dataset of car sales per brand, per year, per month and the number of units sold. I will use this dataset to demonstrate how we can replicate the windowing LAG function in Power Query.
In my example I’d like to calculate the month on month sales growth per car brand, per year. Using SQL this would be relatively simple using the LAG function to access the previous row value and then simply subtract the output of the LAG function from the column unit_sold with the use of a CTE.
Which would then generate the output below.
However, what if you found yourself in the situation where you couldn’t use SQL? Perhaps you’ve connected to an Analysis Cube or perhaps just a .CSV file in an Azure Storage Account. What do you do then…?
I’ll now show you how you use Power Query to help you solve your problem!
To start open Power BI Desktop and click “Home” > “Get Data” > “Text/CSV” and then select and open your .csv file. In my case it was CarSales.csv.
Now instead of just opening the data, you’ll want to click the option to “Transform Data” which will then open Power Query.
Once Power Query is open you’ll then need to sort the columns you want to partition by and order by. In my case it was “brand”, “year” and “month” which I sorted in ascending order. To do this click on the small down arrow to the right of each of the columns and click “Sort Ascending”.
This then produces the output/preview shown below.
The next step is to add an index column. To do this click on “Add Column” in the ribbon, click “Index Column” and then click “From 0”. This will then create a new column called “Index” which will increment by 1 starting from 0.
Once done you’ll then need to create a new column which will add 1 to this “Index” column. To do this click on the “Index” column to select it and then click on “Add Column” from the ribbon, then click “Standard” and finally click “Add”.
This will then launch a new window where you’ll be able to specify the value to add. For this example as we just want to access the previous row to get the previous months sales data we’ll enter the value 1. This step essentially replicates the [offset] argument in the LAG function. If instead you wanted to replicate the LEAD function you could instead use the “Subtract” option to subtract 1 away from this index column.
Once done, click “OK” and this will create a new column called “Addition”. To make things clearer for the purpose of this blog we’ll rename this to “Offset_Index”.
Next using the newly created “Index” and “Offset_Index” columns we want to left join this table back to itself. To do this click “Home” in the ribbon and then click “Merge Queries”.
This will then open a new window. You now need to select the 2 columns you want to partition by plus the index column. This will form the basis of your join criteria. To do this control + click the columns you want to join on. In my example these are “brand”, “year” and “index”.
Next in the empty drop down select the same table as the table you want to merge to. It will have “(Current)” tagged onto the end of the name. Now ensure that the join kind is set to “Left Outer”.
Next, control + click the 2 columns you want to partition in the new table plus the “Offset_Index” column we created earlier. In my case these were “brand”, “year” and “Offset_Index”.
You’ll now see the following join logic all joined by an “and” condition:
- “brand” from “CarSales” left joins to “brand” from “CarSales (Current)” indicated by the “1” next to the column heading each of the columns.
- “year” from “CarSales” left joins to “year” from “CarSales (Current)” indicated by the “2” next to the column heading each of the columns.
- “Index” from “CarSales” left joins to “Offset_Index” from “CarSales (Current)” indicated by the “3” next to the column heading each of the columns.
Next click OK and this will create a new column called “Renamed Columns” (which was just the name from the previous power query step) but essentially contains all of the columns from the newly joined (or merged) table.
Next expanded the “Renamed Columns” column by clicking the double-sided arrow and select the column you want to access from the previous row. In my case it was the “units_sold” column from the “CarSales (Current)” table.
Now you’ll want to rename this column to something that’s more meaningful to the context of your data. In my example I renamed this column to “units_sold_previous_month”.
Finally to solve the original problem we just needed to calculate the sales growth per month, per brand, per year. To do this click “Add Column” in the ribbon and then click “Custom Column”. When the new window opens call the new column something meaningful to your data and enter a custom formula to solve your problem. In my example I called this column “sales_growth” and entered a custom formula of [units_sold] – [units_sold_previous_month].
This then created a new column called “sales_growth”.
Finally the 3 newly created columns we created in the steps earlier can now be deleted. In my example these were “Index”, “Offset_Index” and “units_sold_previous_month”. To do this control + click the columns you want to remove, right click and then left click “remove columns”.
And that’s it! These steps have successfully replicated the LAG function in Power Query!
I hope this step by step guides helps your solve your windowing function needs in Power Query.
Have you ever been faced with the issue where Power BI automatically adds a sigma symbol to the front of your column if it detects it’s a data type of integer or decimal?
Now this isn’t too much of an issue if you’ve imported only a few columns but what if you’ve imported lots? Going through this column by column can become tiresome!
Luckily there is a hack, not only to turn summarization to “None” on multiple columns at once, but also to quickly change the formatting too.
As an example, I have imported a table from Excel into Power BI with 27 columns (“Column A” through to “Column Z”) and a “Key” column. Power BI detected that all of my columns contain whole numbers (or integers) and applied a summarization of “sum” against each column automatically.
So how do we quickly change this? Click the “Model” tab on the left hand side. Open the table up to expose all the columns within a table you want to change multiple columns within. Next select the inital column you want to change and then either select a range of fields using Shift+Click or select multipile fields using Control+Click.
Once you’ve selected the group of columns you want to change find the advanced drop down in the properties window and change the “Summarize by” option to “None”.
That’s it! There are other formatting properties in here too so feel free to have a play and change multiple columns at once!
Have you ever wanted to export a table from Power BI Desktop into Excel just to make sure the DAX you’ve written is performing as expected but ran into this error message: “Data exceeds the limit – Your data is too large. Some data sampling may occur”?
Most probably this has occurred because you’ve got more than 30,000 rows of data in your table that you’re trying to export. In the example shown below I’ve actually got 30,001 rows of data – a row containing column headers plus 30,000 rows of actual data.
If I now change the row count in my “Table To Export” table to 29,999 rows of actual data using a simple filter I have no issues exporting the data. This brings the grand total to 30,000 rows (29,999 rows of actual data plus a row containing columns headers).
But what if you really need to export more than 30,000 rows of data? Well there is a way…
Firstly make sure you’ve got DAX Studio installed on your machine.
Now within Power BI Desktop click “View” and then “Performance analyzer”.
This will open a new window pane called “Performance analyzer”.
Now click “Start Recording” and then click “Refresh Visuals”. Next within the Performance analyzer window locate the name of the visualisation which in my case is “Table To Export” and click the small “+” symbol next to it.
Next go ahead and grab the DAX query which has been used to construct the table. Click on the link “Copy query” which will copy the DAX query to your clipboard.
Next you’ll need to launch DAX Studio from within Power BI Desktop.
To do this on the top ribbon in Power BI Desktop click “External Tools” and then click “DAX Studio”.
Now within DAX Studio paste the copied query from your clipboard into the query window.
Next remove the variable which limits the number of rows which will be returned (TOPN) and replace the “EVALUATE” statement with “_DS0Core” as per the snip below.
Next change the output of the query to be executed from “Grid” to “File”.
Finally click the “Run” button and let DAX Studio Export your data containing at least 30,000 rows to a CSV file. Simple!
I hope this helps with any debugging you might need to perform in Power BI Desktop!
Have you ever come across an issue where your Power BI matrix column widths just aren’t the same width and visually just don’t look right?
Unfortunately (as of April 21’) there is no easy way to make all column widths equal in the format pane of a matrix visual.
However there is a hack to set the width of your all columns in a matrix so that they are all equal and pixel perfect with DAX!
How do you do this I hear you ask?
Firstly create a new measure called ‘Set Column Width’ and enter a string value equal to the length of your longest column title. In my case my longest title is “Front Derailleur Cage” and this has a length of 21 characters including spaces. Therefore I need to set my DAX expression to be a string which is 21 characters long. In my example I’ve just created a string of 21 asterixis wrapped in speech marks – but this can be any combinations of characters you like!
The next job is to go to the format pane of your matrix and set the “Show on rows” toggle equal to ‘On’.
Next make sure the ‘Auto-size column width’ is set equal to ‘On’.
Now drag your newly created DAX expression (in my case ‘Set Column Width’) on to the values field of your matrix.
You’ll then notice that your matrix will look similar to the snip below – a little bit of a mess! But not to worry this all part of the plan!
Next go back the format pane of your matrix and set ‘Auto-size column width’ to ‘Off’.
Now remove your ‘Set Column Width’ measure from the visual by clicking the ‘X’ symbol on the field pane.
Finally increase the width and height of your matrix visual to accommodate the increased column widths.
You now have pixel perfect column widths which are all equal to each other!
A huge thanks to the brilliant MVP Ruth Pozuelo Martinez (@ruthpozuelo) from curbal.com for this hack! It’s a been a massive help for my Power BI reports here at Purple Frog! Hopefully the Power BI team will release a proper solution in the matrix format pane soon!
Whilst carrying out some work for a client using Azure Data Factory I was presented with the challenge of triggering different activities depending on the result of a stored procedure.
In fact the challenge posed was to…
- Execute ‘Copy A’ activity if the result of a stored procedure returned (A),
- Execute ‘Copy B’ activity if the result of a stored procedure returned (B),
- Execute ‘Copy C’ activity if the result of a stored procedure returned (C)
- Finally execute an error stored procedure if the result of the stored procedure returned (D).
Prior to the switch statement I could achieve this using 4 ‘IF’ activities connected to a lookup activity as shown in the snip below using my ‘Wait’ example pipeline.
However a neater solution is to use the ‘Switch’ activity to do this work instead. I’ll now jump straight into a worked example to show you how I achieved this.
I created a demo.RandomNumberGenerator stored procedure in our Purple Frog Demo database which declares a variable and sets it equal to a random number between 1 and 4. The stored procedure then uses this variable in a case statement to return a string. So if the @randomnumber variable returns 1 the case statement returns ‘Wait 1 Minute’.
I then used a lookup activity in ADF to return the result of this stored procedure which I then used to ‘feed’ my ‘Switch’ activity.
When I preview this lookup activity it just returns the result of the stored procedure. When this is run time and time again it just returns a random ‘ADFAction’ as named in my case statement generated by my rand() SQL function. So in the scenario above it returned an action of ‘Wait 1 Minute’. If I were to preview the stored procedure again it might return ‘Wait 4 Minutes’ instead.
I can then connect my switch activity to this lookup activity using the ‘success’ green output connector.
I now need to add some dynamic content to refer to the output of my lookup activity. I don’t need to create dynamic nested IF content to say if my stored procedure equals ‘Wait 1 Minute’ then do this, if my stored procedure equals ‘Wait 2 Minutes’ then do this… all I need to do is refer to the output of the connected lookup activity ‘Random Number Lookup Activity’.
So in my example this is simply just @activity(‘Random Number Lookup Activity’).output.firstrow.adfaction. ADF will then match the output of the stored procedure to the case statements I provide it. The default activity is just a set of activities that are executed when the expression evaluation isn’t satisfied. I.e. the output of my the stored procedure doesn’t match any of the case statements I provide it.
In the snip below I have provided 4 case statements which match all of the possible outputs from my stored procedure. Essentially there isn’t any need for default activity as my stored procedure will always return 1 of 4 results but it’s useful to know for future reference.
Within each case statement there is an activity for which I would like ADF to perform if it matches the result of my stored procedure. In my example these are just ‘Wait’ activities. So if the stored procedure returns ‘Wait 1 Minute’ I’ve put a ‘Wait’ activity within the case statement to wait 1 minute (or 60 seconds).
When I first ran this pipeline in debug mode I could see that the output of the stored procedure was ‘Wait 2 Minutes’. The switch activity then matched this to the case statements I provided and performed the activity within the ‘Wait 2 Minutes’ case statement and triggered my ‘Wait 2 Minutes’ activity.
When I debugged another run my stored procedure returned ‘Wait 4 Minutes’. The switch activity then executed my ‘Wait 4 Minutes’ activity. Perfect!
So it’s pretty much as simple that!
I hope this helps all you ADF’ers out there!
There is now a new tenant setting for Power BI admins to prevent users from creating classic workspaces in Power BI! This includes from places like MS Teams! Hurrah!
If you’re a Power BI Admin you can now enable ‘Block classic workspace creation’.
To do this go to the powerbi.com portal and click the ‘settings’ icon in the top right hand corner of the screen.
Click on ‘Admin portal’ and then locate and click ‘Tenant settings’.
Next under the ‘Workspace settings’ click ‘Block classic workspace creation’.
Now change the toggle from ‘Disabled’ to ‘Enabled’. This will now block the creation of new classic workspaces in your Power BI portal from places like MS Teams.
This will also automatically remove classic workspaces from your Power BI workspace list, but only if they were created by Teams, and only if they’ve never been accessed. If they have been accessed then they’ll be left there for you to deal with manually.
Now you’re probably thinking well I’ll just delete any old remaining workspaces which have been accessed but are no longer being used! Woah hold on…If you do this you’ll end up deleting the underlying office 365 modern group (and the associated Team!) which still may be used.
A better way of doing this is to upgrade the workspace to the new experience and then delete it.
To do this, click on the ellipsis next to the filters button within the workspace you want to delete and click ‘Edit workspace’. Next go to the advanced section and under ‘Upgrade this workspace (preview)’ select ‘Upgrade now’.
A small window will appear in the middle of your screen.
Check the ‘I’m ready to upgrade this workspace’ option and then click ‘Upgrade’.
Now, once the workspace has been upgraded to the new experience you can then click on ‘Settings’ icon and click ‘Delete Workspace’. This will now leave the office 365 modern group untouched but remove it from your workspace. Simple!
As always a big thanks to the guys from at Guy In A Cube (Twitter: https://twitter.com/GuyInACube) on You Tube for sharing this knowledge!
Check out their video here: https://www.youtube.com/watch?v=T2PAL4D2SvU.
Here’s the link to the Microsoft Power BI team’s announcement on this new feature: https://powerbi.microsoft.com/en-us/blog/announcing-you-can-now-block-classic-workspace-creation/
In this blog post we’ll take a quick look at using ConcatenateX function to view a concatenated string of dates where the max daily sales occurred for a given month.
I came across this function whilst going through the excellent “Mastering DAX 2nd Edition Video Course” by the guys from SQLBI.com. So credit to Marco and Alberto for sharing this.
So how does it work? If we had a list of dates ranging from 01/01/2020 to 31/12/2020 and we wanted to see which days we achieved maximum sales for each given month in a year we could use the ConcatenateX function to return these dates in a single row per month.
As we can see in the screenshot below, the left hand table shows the month of June where we achieved maximum sales for in June on both 18/06/2020 and 25/06/2020 of 99. In the table to the right we can see those two dates presented on a single row for the month of June in the column “What were the max days?”. This was column was created using the ConcatenateX function!
So let us first look at what the maximum daily sales were per month. To do this we’ll use the MAXX function to create “Max Daily Sales”. This returns the maximum daily sales rate achieved for each given month as a single value. So for the month of June this would be 99. The problem with this is we are not sure which days these max sales were achieved on without drilling down into the data. Was this just one day or was it multiple days? All we can see is a figure of 99.
So let us create a new measure to work out on which days this figure of 99 occurred on.
The variable at point (a) returns a table with a single column which lists all of the unique dates in our Sales_2020 table.
The variable at point (b) returns the max sales for the given filter context in this case month.
The variable at point (c) uses the filter function to filter out only the days where the max sales were achieved by setting the total quantity sold measure to the max daily sales variable. For example in June we achieved max daily sales of 99 on 18/06/2020 and 25/06/2020. Therefore the variable at point (c) would filter out the ListOfDays table variable to just 18/06/2020 and 25/06/2020 only.
If we just had one max sales day per month we could simply return MaxDaysOnly. However we may have multiple days per month where max sales were achieved. Hence we use the ConcatenateX function to create a string of dates.
The variable at point (d) creates a string of concatenated dates separated by the delimiter “,” which can be used against a single row in a table.
Wrapping it all up returns us this table below, which shows us which days max sales were achieved per given month! Pretty cool eh?
Check out https://www.sqlbi.com/articles/mastering-dax-video-course-2nd-edition/ for more information regarding the online DAX course as well as https://twitter.com/marcorus and https://twitter.com/ferrarialberto on Twitter!
By Nick Edwards
In this blog post we’ll take a quick look at creating a self-generating calendar table using DAX.
Dates are important if we want to perform time intelligence reporting on our data i.e. yearly sales, monthly sales, weekly sales, year to date sales or previous year sales.
We’ll be using the calendar function to create our date table, but there are other methods to do this such as CALENDARAUTO or GENERATESERIES.
Here is the syntax we’ll be using to generate our date table.
The calendar function returns a single column called “Date” which generates a continuous series of dates from the specified start date to the specified end date. So if we specified the start date to be the 01/01/2020 and the end date to be 31/12/2020 the function would generate 366 rows of distinct date data.
We can then use the add columns function to expand our calendar table further to add specific columns we wish to slice our data by i.e. year, month, quarter, week number, day…
Here is the syntax we will be using to expand our calender table.
Let’s use an example to further explore how these functions work in practice using some sample adventure works sales data. Here we use the “Get Data” icon to directly query our adventure works database and bring over a sample of sales data, named “AW_Sales”.
Now, to create our calendar table we need to click “New Table” in the modeling tab and enter the following function.
Note: I like to use variables in my date table just to keep the DAX looking clean and fuss free, but this isn’t necessary.
For the start date parameter we have used the FIRSTDATE() and for the end Date parameter we have used the LASTDATE() function. This is so we can extract the first and last “OrderDate” from our AW_Sales table. We could have also used the MIN() and MAX() function to deliver the same results using the newly created “Date” column. As we can see this has generated a sequential date list from the 01/01/2012 to the 31/12/2013 with 731 distinct rows of data.
Now we want to expand our calendar table using the newly created “Date” column with new columns which can slice and dice our data. To do this I use the ADDCOLUMNS function. Here I have added Year, Quarter, Year Month, Month Number, Month Name, Day Of Year, Day Of Month, Day Of Week and Day Name as columns.
Once we’ve done this, we will mark our newly created table as a date table to allow Power BI to recognize date hierarchies and time intelligence functions.
We can now view our newly generated calender table in the data view. Instantly when any new data enters our model from “AW_Sales”, the calendar table will expand accordingly due to the last date function used above.
Now it’s just a case of creating a one to many relationship between our new calendar table and our “AW_Sales” table. We’ll create a one to many relationship between “Calendar[Date]” and “AW_Sales[OrderDate]” as shown below.
Congratulations we have now created a fully fledged calendar table that can slice and dice our “AW_Sales” tale by any of the columns we have created in our calendar table, as shown in the example below.