I’m sure you’ve all heard of windowing functions in SQL Server? But what if you need to replicate these windowing functions in Power Query?
Here I have a sample CarSales.csv dataset of car sales per brand, per year, per month and the number of units sold. I will use this dataset to demonstrate how we can replicate the windowing LAG function in Power Query.
In my example I’d like to calculate the month on month sales growth per car brand, per year. Using SQL this would be relatively simple using the LAG function to access the previous row value and then simply subtract the output of the LAG function from the column unit_sold with the use of a CTE.
Which would then generate the output below.
However, what if you found yourself in the situation where you couldn’t use SQL? Perhaps you’ve connected to an Analysis Cube or perhaps just a .CSV file in an Azure Storage Account. What do you do then…?
I’ll now show you how you use Power Query to help you solve your problem!
To start open Power BI Desktop and click “Home” > “Get Data” > “Text/CSV” and then select and open your .csv file. In my case it was CarSales.csv.
Now instead of just opening the data, you’ll want to click the option to “Transform Data” which will then open Power Query.
Once Power Query is open you’ll then need to sort the columns you want to partition by and order by. In my case it was “brand”, “year” and “month” which I sorted in ascending order. To do this click on the small down arrow to the right of each of the columns and click “Sort Ascending”.
This then produces the output/preview shown below.
The next step is to add an index column. To do this click on “Add Column” in the ribbon, click “Index Column” and then click “From 0”. This will then create a new column called “Index” which will increment by 1 starting from 0.
Once done you’ll then need to create a new column which will add 1 to this “Index” column. To do this click on the “Index” column to select it and then click on “Add Column” from the ribbon, then click “Standard” and finally click “Add”.
This will then launch a new window where you’ll be able to specify the value to add. For this example as we just want to access the previous row to get the previous months sales data we’ll enter the value 1. This step essentially replicates the [offset] argument in the LAG function. If instead you wanted to replicate the LEAD function you could instead use the “Subtract” option to subtract 1 away from this index column.
Once done, click “OK” and this will create a new column called “Addition”. To make things clearer for the purpose of this blog we’ll rename this to “Offset_Index”.
Next using the newly created “Index” and “Offset_Index” columns we want to left join this table back to itself. To do this click “Home” in the ribbon and then click “Merge Queries”.
This will then open a new window. You now need to select the 2 columns you want to partition by plus the index column. This will form the basis of your join criteria. To do this control + click the columns you want to join on. In my example these are “brand”, “year” and “index”.
Next in the empty drop down select the same table as the table you want to merge to. It will have “(Current)” tagged onto the end of the name. Now ensure that the join kind is set to “Left Outer”.
Next, control + click the 2 columns you want to partition in the new table plus the “Offset_Index” column we created earlier. In my case these were “brand”, “year” and “Offset_Index”.
You’ll now see the following join logic all joined by an “and” condition:
- “brand” from “CarSales” left joins to “brand” from “CarSales (Current)” indicated by the “1” next to the column heading each of the columns.
- “year” from “CarSales” left joins to “year” from “CarSales (Current)” indicated by the “2” next to the column heading each of the columns.
- “Index” from “CarSales” left joins to “Offset_Index” from “CarSales (Current)” indicated by the “3” next to the column heading each of the columns.
Next click OK and this will create a new column called “Renamed Columns” (which was just the name from the previous power query step) but essentially contains all of the columns from the newly joined (or merged) table.
Next expanded the “Renamed Columns” column by clicking the double-sided arrow and select the column you want to access from the previous row. In my case it was the “units_sold” column from the “CarSales (Current)” table.
Now you’ll want to rename this column to something that’s more meaningful to the context of your data. In my example I renamed this column to “units_sold_previous_month”.
Finally to solve the original problem we just needed to calculate the sales growth per month, per brand, per year. To do this click “Add Column” in the ribbon and then click “Custom Column”. When the new window opens call the new column something meaningful to your data and enter a custom formula to solve your problem. In my example I called this column “sales_growth” and entered a custom formula of [units_sold] – [units_sold_previous_month].
This then created a new column called “sales_growth”.
Finally the 3 newly created columns we created in the steps earlier can now be deleted. In my example these were “Index”, “Offset_Index” and “units_sold_previous_month”. To do this control + click the columns you want to remove, right click and then left click “remove columns”.
And that’s it! These steps have successfully replicated the LAG function in Power Query!
I hope this step by step guides helps your solve your windowing function needs in Power Query.
Variables can simplify your DAX code, help with debugging and help improve performance. To use variables in your DAX measure you need to declare a variable using the VAR keyword, give the variable a name, and then assign an expression to the variable. You can use multiple variables in a measure but when using variables you must use a RETURN statement to specify the final output to be returned.
To show how to make use of variables I’ll go through a simple measure and rewrite it using variables, I’ll be using the Adventure Works Internet Sales tabular cube for this.
A measure called Total Sales YoY Growth % (shown below) has been created which calculates sales for current year and the previous year, then works out the percentage increase/decrease from the previous year to the current year.
You can see there is repeated logic for the CALCULATE block and this means it is evaluated twice, notice that SUM(‘Internet Sales'[Sales Amount]) is referenced 3 times. Using variables for the repeated logic gives the following code:
In this version the result of SUM(‘Internet Sales'[Sales Amount]) has been assigned to the variable TotalSales and the result of the CALCULATE section has been assigned to the variable TotalSalesPP. The CALCULATE section is now evaluated only once and assigned to a variable, this is a basic example so performance gain is insignificant but this method will help the performance of more complex measures.
To simplify the code further a new variable called TotalSalesVariance is created, and a final variable called Result is created to store the final calculation to be returned. The final code then becomes:
This version then allows you to debug the measure by using any of the variables in the RETURN statement, making it possible to verify the values for each of the variables:
The one thing to look out for with variables is filter context, once a variable has been declared and a value assigned to it you cannot use a CALCULATE statement to override the filter context and get a new value using the variable.
For example, TotalSalesPP = CALCULATE(TotalSales, PARALLELPERIOD(‘Date'[Date], -12, MONTH)) would return the same value as the variable TotalSales, hence TotalSalesPP = CALCULATE(SUM(‘Internet Sales'[Sales Amount]), PARALLELPERIOD(‘Date'[Date], -12, MONTH)) is the correct way to write this.
Hopefully this blog will help you introduce variables into your DAX code!
Have you ever been faced with the issue where Power BI automatically adds a sigma symbol to the front of your column if it detects it’s a data type of integer or decimal?
Now this isn’t too much of an issue if you’ve imported only a few columns but what if you’ve imported lots? Going through this column by column can become tiresome!
Luckily there is a hack, not only to turn summarization to “None” on multiple columns at once, but also to quickly change the formatting too.
As an example, I have imported a table from Excel into Power BI with 27 columns (“Column A” through to “Column Z”) and a “Key” column. Power BI detected that all of my columns contain whole numbers (or integers) and applied a summarization of “sum” against each column automatically.
So how do we quickly change this? Click the “Model” tab on the left hand side. Open the table up to expose all the columns within a table you want to change multiple columns within. Next select the inital column you want to change and then either select a range of fields using Shift+Click or select multipile fields using Control+Click.
Once you’ve selected the group of columns you want to change find the advanced drop down in the properties window and change the “Summarize by” option to “None”.
That’s it! There are other formatting properties in here too so feel free to have a play and change multiple columns at once!
The UNICHAR() DAX function is a text function that takes a numerical Unicode value and displays its associated character. For example, UNICHAR(128515) will display as:
90% of the information the human brain processes is visual and we process images up to 60,000 times faster than text, so it makes perfect sense to use icons where possible to enhance reports. This scarcely used DAX function opens-up that option.
The below stacked column chart uses Unicode emoticons to enhance the readability of the ‘Genre’ axis labels.
So, how do we achieve this?
To produce this you will need to edit the query. In the ‘Data’ view, right click the relevant table and select “Edit Query”
First, duplicate the existing column you want Unicode characters for (genre in this case). Then use the ‘Replace Values’ option to substitute in the relevant Unicode numbers for each genre.
(this can be hidden from the report view as it contains nothing meaningful).
Next, create a second calculated column that uses a simple measure:
IconColumn = (UNICHAR(UnicodeNumberColumn))
This new ‘Icon’ column can now be used in reports the same way as any other text column.
Note how in the stacked column chart above, the original names have been included, this is good practice for two main reasons. One is clarity, a clown denotes comedy to most users, but could indicate horror to others, including the label removes the ambiguity.
The other reason is due to possible compatibility issues. It is worth pointing out here that the Unicode characters will only display when the character exists in the chosen font. In most cases this will be fine, especially for emoji characters, but just in case there are display issues it is worth including the full label.
Staying with the movie topic, the below chart shows movie ratings both numerically and visually created by a custom measure:
Stars = REPT(UNICHAR(11088), AVERAGE('IMDB 1000'[10 Star Rating]))
A measure that uses the UNICHAR() function will always be a text field and as such, normal formatting applies, in the example above we can set colours to be gold on a black background.
The previous examples do help readability but don’t really add anything meaningful to the report. The below table shows that the UNICHAR() function can add worthwhile content with customisable KPIs by combining it with conditional formatting.
There are 143,859 Unicode characters available, everything from emojis, symbols, shapes and braille patterns to dice and playing cards. Whether you want to offer further insight into your data, enhance the user experience or simply create something sublimely ridiculous, with so many icons at your fingertips, the possibilities are only limited by your imagination.
Further information on the UNICHAR() function can be found here: UNICHAR function (DAX) – DAX | Microsoft Docs
A list of Unicode characters and their respective numerical values can be found here: Huge List of Unicode Characters
Have you ever wanted to export a table from Power BI Desktop into Excel just to make sure the DAX you’ve written is performing as expected but ran into this error message: “Data exceeds the limit – Your data is too large. Some data sampling may occur”?
Most probably this has occurred because you’ve got more than 30,000 rows of data in your table that you’re trying to export. In the example shown below I’ve actually got 30,001 rows of data – a row containing column headers plus 30,000 rows of actual data.
If I now change the row count in my “Table To Export” table to 29,999 rows of actual data using a simple filter I have no issues exporting the data. This brings the grand total to 30,000 rows (29,999 rows of actual data plus a row containing columns headers).
But what if you really need to export more than 30,000 rows of data? Well there is a way…
Firstly make sure you’ve got DAX Studio installed on your machine.
Now within Power BI Desktop click “View” and then “Performance analyzer”.
This will open a new window pane called “Performance analyzer”.
Now click “Start Recording” and then click “Refresh Visuals”. Next within the Performance analyzer window locate the name of the visualisation which in my case is “Table To Export” and click the small “+” symbol next to it.
Next go ahead and grab the DAX query which has been used to construct the table. Click on the link “Copy query” which will copy the DAX query to your clipboard.
Next you’ll need to launch DAX Studio from within Power BI Desktop.
To do this on the top ribbon in Power BI Desktop click “External Tools” and then click “DAX Studio”.
Now within DAX Studio paste the copied query from your clipboard into the query window.
Next remove the variable which limits the number of rows which will be returned (TOPN) and replace the “EVALUATE” statement with “_DS0Core” as per the snip below.
Next change the output of the query to be executed from “Grid” to “File”.
Finally click the “Run” button and let DAX Studio Export your data containing at least 30,000 rows to a CSV file. Simple!
I hope this helps with any debugging you might need to perform in Power BI Desktop!
Which date format styles should we use if we are building a report that is being consumed internationally?
Remember, 01/12/2021 is December 1st or January 12th depending in which part of the world it is being read.
The decision may be taken from our hands if there is a company policy in place. If the company is based in the USA, for example, they may choose to use US formatted date fields as a standard for reporting across the entire business, however, if the field needs to be truly dynamic depending on the consumers location, the answer lies in this tool tip:
There are 2 formats in the selection that are prefixed with an asterisk:
There are 2 variables that the Power BI Service checks when loading reports in the service.
First it will check the language setting of the user account in the service. This is set under ‘Settings >> General >> Language’. There is a dropdown option that acts as both a language and regional setting, this drives how dates are formatted when dynamic date formats are used.
If this is set to ‘Default (browser language)’ the second variable, the browser’s default language setting, will take effect.
In Edge this is set under ‘Settings >> Language’, when multiple languages are set, the topmost one is considered the default.
In Chrome it is set under ‘Settings >> Advanced >> Language’, this uses the same system as Edge where the topmost language is used as default.
Here is an example of a table loaded in a browser using both English UK and English US:
This example shows that not only does the format of the date itself change (day and month have switched) but there are also visual connotations to account for. The US format uses a 12-hour clock by default and the addition of the AM/PM suffix changes the column width and drastically alters the readability of the table and potentially the entire report. It is these occurrences we need to be aware of when developing reports for international consumption.
This issue can easily be avoided by using the ‘Auto-size column width’ setting under ‘Column Headers’ on the formatting tab of the visual, or by allowing for the growth when setting manual column widths. (For a great guide on manually setting equal column widths, please read this helpful post by my colleague, Nick Edwards)
Unfortunately, this post comes with a caveat, at the time of writing it would seem there is a bug in Power BI. Remember this from earlier?
As you can see below, both fields use the UK format of DD/MM/YYYY when the browser language is set to English UK.
However, when the browser settings are changed to English US, only the *‘General Date’ format has changed, the *’DD/MM/YYYY’ format is still showing in the UK format even though there is an asterisk next to it in the selection list.
Hopefully once this issue is addressed, the use of regionally dynamic date formats will be available for both long and short formats.
In Power BI Power Query there is an option to enable or disable whether a table is loaded into the report. The option ‘Enable load’ can be found by right clicking on the table. Typically, by default, the load is already enabled.
There is also an option ‘Include in report refresh’ which lets a user stop a table from refreshing when they refresh the entire report. This maybe useful for static tables or tables that are large which take a long time to refresh and a user wants to concentrate on how other tables are refreshing.
Once a user disables the ‘Enable load’ option, the table name turns italic which is an easy way for users to determine whether a table will be loaded or not.
After applying these changes, no data has been loaded into the report.
To re-enable the load, jump back into Power Query, right click on the table and ‘Enable load’.
Finally, some scenarios where it might be useful to disable loading a table:
– Disable loading tables in Power Query that were only ever stepping stones to create other tables
– See how removing a table effects your report before deleting it
– Removing a table that might be required again in the future
Have you ever come across an issue where your Power BI matrix column widths just aren’t the same width and visually just don’t look right?
Unfortunately (as of April 21’) there is no easy way to make all column widths equal in the format pane of a matrix visual.
However there is a hack to set the width of your all columns in a matrix so that they are all equal and pixel perfect with DAX!
How do you do this I hear you ask?
Firstly create a new measure called ‘Set Column Width’ and enter a string value equal to the length of your longest column title. In my case my longest title is “Front Derailleur Cage” and this has a length of 21 characters including spaces. Therefore I need to set my DAX expression to be a string which is 21 characters long. In my example I’ve just created a string of 21 asterixis wrapped in speech marks – but this can be any combinations of characters you like!
The next job is to go to the format pane of your matrix and set the “Show on rows” toggle equal to ‘On’.
Next make sure the ‘Auto-size column width’ is set equal to ‘On’.
Now drag your newly created DAX expression (in my case ‘Set Column Width’) on to the values field of your matrix.
You’ll then notice that your matrix will look similar to the snip below – a little bit of a mess! But not to worry this all part of the plan!
Next go back the format pane of your matrix and set ‘Auto-size column width’ to ‘Off’.
Now remove your ‘Set Column Width’ measure from the visual by clicking the ‘X’ symbol on the field pane.
Finally increase the width and height of your matrix visual to accommodate the increased column widths.
You now have pixel perfect column widths which are all equal to each other!
A huge thanks to the brilliant MVP Ruth Pozuelo Martinez (@ruthpozuelo) from curbal.com for this hack! It’s a been a massive help for my Power BI reports here at Purple Frog! Hopefully the Power BI team will release a proper solution in the matrix format pane soon!
A drill through in Power BI allows the reader to see secondary data related to the original page with the context of a specific data point applied, for example, drilling through on sales data can display the demographic information of the relevant customers for those sales.
One limitation of the drill through functionality is that it only allows users to drill through on a single data point. If more than one is selected, the drill through function will be disabled. Using the above example, this means that a reader can drill through to the demographic of the sales of one product at a time, but not a combination of two or three.
You can see this when using a drill through button, the button only works when one data point is selected.
If you select multiple points the button is greyed out and if you hover over the it, you get the following tool tip appear:
“To drill through to [page name], select a single data point from [page name]“
Curiously, since native drill throughs on card visualisations were introduced back in September 2020, Power BI considers a card to be a single data point, regardless of the number of filters applied to it.
If you drill through on the card with multiple data points selected, the drill through page will have all of the relevant filters applied.
Currently there is no method of getting the button to function with multiple data points selected, even though the above behaviour suggests there is scope to do so. At the time of writing, Microsoft have confirmed that this behaviour is intended functionality for the drill through button.
So to conclude, if you need to allow drill throughs for a multi-select scenario, currently your only option at the moment is to replace Buttons for Cards and perhaps include a tip for the reader to know its there, hopefully this may change in the future.
I have created a Power BI report which provides detail on the state of processing in a Tabular Cube.
The report uses the cube’s dynamic management views to provide information about the cube’s partitions.
To use the tabular cube processing report, you need to insert the following information:
- Database (Cube Name)
Once entered and assuming the connection is fine you need to accept some native queries. These statements are select statements and will not alter the cube. That should give you a report similar to the one below, I have used an adventure works tabular model as an example.
This report is inspired by one I found for general tabular cube documentation by Data Savvy, which I highly recommend. That report includes information on: tables, columns, relationships, measure and security: