I’m sure you’ve all heard of windowing functions in SQL Server? But what if you need to replicate these windowing functions in Power Query?
Here I have a sample CarSales.csv dataset of car sales per brand, per year, per month and the number of units sold. I will use this dataset to demonstrate how we can replicate the windowing LAG function in Power Query.
In my example I’d like to calculate the month on month sales growth per car brand, per year. Using SQL this would be relatively simple using the LAG function to access the previous row value and then simply subtract the output of the LAG function from the column unit_sold with the use of a CTE.
Which would then generate the output below.
However, what if you found yourself in the situation where you couldn’t use SQL? Perhaps you’ve connected to an Analysis Cube or perhaps just a .CSV file in an Azure Storage Account. What do you do then…?
I’ll now show you how you use Power Query to help you solve your problem!
To start open Power BI Desktop and click “Home” > “Get Data” > “Text/CSV” and then select and open your .csv file. In my case it was CarSales.csv.
Now instead of just opening the data, you’ll want to click the option to “Transform Data” which will then open Power Query.
Once Power Query is open you’ll then need to sort the columns you want to partition by and order by. In my case it was “brand”, “year” and “month” which I sorted in ascending order. To do this click on the small down arrow to the right of each of the columns and click “Sort Ascending”.
This then produces the output/preview shown below.
The next step is to add an index column. To do this click on “Add Column” in the ribbon, click “Index Column” and then click “From 0”. This will then create a new column called “Index” which will increment by 1 starting from 0.
Once done you’ll then need to create a new column which will add 1 to this “Index” column. To do this click on the “Index” column to select it and then click on “Add Column” from the ribbon, then click “Standard” and finally click “Add”.
This will then launch a new window where you’ll be able to specify the value to add. For this example as we just want to access the previous row to get the previous months sales data we’ll enter the value 1. This step essentially replicates the [offset] argument in the LAG function. If instead you wanted to replicate the LEAD function you could instead use the “Subtract” option to subtract 1 away from this index column.
Once done, click “OK” and this will create a new column called “Addition”. To make things clearer for the purpose of this blog we’ll rename this to “Offset_Index”.
Next using the newly created “Index” and “Offset_Index” columns we want to left join this table back to itself. To do this click “Home” in the ribbon and then click “Merge Queries”.
This will then open a new window. You now need to select the 2 columns you want to partition by plus the index column. This will form the basis of your join criteria. To do this control + click the columns you want to join on. In my example these are “brand”, “year” and “index”.
Next in the empty drop down select the same table as the table you want to merge to. It will have “(Current)” tagged onto the end of the name. Now ensure that the join kind is set to “Left Outer”.
Next, control + click the 2 columns you want to partition in the new table plus the “Offset_Index” column we created earlier. In my case these were “brand”, “year” and “Offset_Index”.
You’ll now see the following join logic all joined by an “and” condition:
- “brand” from “CarSales” left joins to “brand” from “CarSales (Current)” indicated by the “1” next to the column heading each of the columns.
- “year” from “CarSales” left joins to “year” from “CarSales (Current)” indicated by the “2” next to the column heading each of the columns.
- “Index” from “CarSales” left joins to “Offset_Index” from “CarSales (Current)” indicated by the “3” next to the column heading each of the columns.
Next click OK and this will create a new column called “Renamed Columns” (which was just the name from the previous power query step) but essentially contains all of the columns from the newly joined (or merged) table.
Next expanded the “Renamed Columns” column by clicking the double-sided arrow and select the column you want to access from the previous row. In my case it was the “units_sold” column from the “CarSales (Current)” table.
Now you’ll want to rename this column to something that’s more meaningful to the context of your data. In my example I renamed this column to “units_sold_previous_month”.
Finally to solve the original problem we just needed to calculate the sales growth per month, per brand, per year. To do this click “Add Column” in the ribbon and then click “Custom Column”. When the new window opens call the new column something meaningful to your data and enter a custom formula to solve your problem. In my example I called this column “sales_growth” and entered a custom formula of [units_sold] – [units_sold_previous_month].
This then created a new column called “sales_growth”.
Finally the 3 newly created columns we created in the steps earlier can now be deleted. In my example these were “Index”, “Offset_Index” and “units_sold_previous_month”. To do this control + click the columns you want to remove, right click and then left click “remove columns”.
And that’s it! These steps have successfully replicated the LAG function in Power Query!
I hope this step by step guides helps your solve your windowing function needs in Power Query.
Variables can simplify your DAX code, help with debugging and help improve performance. To use variables in your DAX measure you need to declare a variable using the VAR keyword, give the variable a name, and then assign an expression to the variable. You can use multiple variables in a measure but when using variables you must use a RETURN statement to specify the final output to be returned.
To show how to make use of variables I’ll go through a simple measure and rewrite it using variables, I’ll be using the Adventure Works Internet Sales tabular cube for this.
A measure called Total Sales YoY Growth % (shown below) has been created which calculates sales for current year and the previous year, then works out the percentage increase/decrease from the previous year to the current year.
You can see there is repeated logic for the CALCULATE block and this means it is evaluated twice, notice that SUM(‘Internet Sales'[Sales Amount]) is referenced 3 times. Using variables for the repeated logic gives the following code:
In this version the result of SUM(‘Internet Sales'[Sales Amount]) has been assigned to the variable TotalSales and the result of the CALCULATE section has been assigned to the variable TotalSalesPP. The CALCULATE section is now evaluated only once and assigned to a variable, this is a basic example so performance gain is insignificant but this method will help the performance of more complex measures.
To simplify the code further a new variable called TotalSalesVariance is created, and a final variable called Result is created to store the final calculation to be returned. The final code then becomes:
This version then allows you to debug the measure by using any of the variables in the RETURN statement, making it possible to verify the values for each of the variables:
The one thing to look out for with variables is filter context, once a variable has been declared and a value assigned to it you cannot use a CALCULATE statement to override the filter context and get a new value using the variable.
For example, TotalSalesPP = CALCULATE(TotalSales, PARALLELPERIOD(‘Date'[Date], -12, MONTH)) would return the same value as the variable TotalSales, hence TotalSalesPP = CALCULATE(SUM(‘Internet Sales'[Sales Amount]), PARALLELPERIOD(‘Date'[Date], -12, MONTH)) is the correct way to write this.
Hopefully this blog will help you introduce variables into your DAX code!
The UNICHAR() DAX function is a text function that takes a numerical Unicode value and displays its associated character. For example, UNICHAR(128515) will display as:
90% of the information the human brain processes is visual and we process images up to 60,000 times faster than text, so it makes perfect sense to use icons where possible to enhance reports. This scarcely used DAX function opens-up that option.
The below stacked column chart uses Unicode emoticons to enhance the readability of the ‘Genre’ axis labels.
So, how do we achieve this?
To produce this you will need to edit the query. In the ‘Data’ view, right click the relevant table and select “Edit Query”
First, duplicate the existing column you want Unicode characters for (genre in this case). Then use the ‘Replace Values’ option to substitute in the relevant Unicode numbers for each genre.
(this can be hidden from the report view as it contains nothing meaningful).
Next, create a second calculated column that uses a simple measure:
IconColumn = (UNICHAR(UnicodeNumberColumn))
This new ‘Icon’ column can now be used in reports the same way as any other text column.
Note how in the stacked column chart above, the original names have been included, this is good practice for two main reasons. One is clarity, a clown denotes comedy to most users, but could indicate horror to others, including the label removes the ambiguity.
The other reason is due to possible compatibility issues. It is worth pointing out here that the Unicode characters will only display when the character exists in the chosen font. In most cases this will be fine, especially for emoji characters, but just in case there are display issues it is worth including the full label.
Staying with the movie topic, the below chart shows movie ratings both numerically and visually created by a custom measure:
Stars = REPT(UNICHAR(11088), AVERAGE('IMDB 1000'[10 Star Rating]))
A measure that uses the UNICHAR() function will always be a text field and as such, normal formatting applies, in the example above we can set colours to be gold on a black background.
The previous examples do help readability but don’t really add anything meaningful to the report. The below table shows that the UNICHAR() function can add worthwhile content with customisable KPIs by combining it with conditional formatting.
There are 143,859 Unicode characters available, everything from emojis, symbols, shapes and braille patterns to dice and playing cards. Whether you want to offer further insight into your data, enhance the user experience or simply create something sublimely ridiculous, with so many icons at your fingertips, the possibilities are only limited by your imagination.
Further information on the UNICHAR() function can be found here: UNICHAR function (DAX) – DAX | Microsoft Docs
A list of Unicode characters and their respective numerical values can be found here: Huge List of Unicode Characters
Have you ever wanted to export a table from Power BI Desktop into Excel just to make sure the DAX you’ve written is performing as expected but ran into this error message: “Data exceeds the limit – Your data is too large. Some data sampling may occur”?
Most probably this has occurred because you’ve got more than 30,000 rows of data in your table that you’re trying to export. In the example shown below I’ve actually got 30,001 rows of data – a row containing column headers plus 30,000 rows of actual data.
If I now change the row count in my “Table To Export” table to 29,999 rows of actual data using a simple filter I have no issues exporting the data. This brings the grand total to 30,000 rows (29,999 rows of actual data plus a row containing columns headers).
But what if you really need to export more than 30,000 rows of data? Well there is a way…
Firstly make sure you’ve got DAX Studio installed on your machine.
Now within Power BI Desktop click “View” and then “Performance analyzer”.
This will open a new window pane called “Performance analyzer”.
Now click “Start Recording” and then click “Refresh Visuals”. Next within the Performance analyzer window locate the name of the visualisation which in my case is “Table To Export” and click the small “+” symbol next to it.
Next go ahead and grab the DAX query which has been used to construct the table. Click on the link “Copy query” which will copy the DAX query to your clipboard.
Next you’ll need to launch DAX Studio from within Power BI Desktop.
To do this on the top ribbon in Power BI Desktop click “External Tools” and then click “DAX Studio”.
Now within DAX Studio paste the copied query from your clipboard into the query window.
Next remove the variable which limits the number of rows which will be returned (TOPN) and replace the “EVALUATE” statement with “_DS0Core” as per the snip below.
Next change the output of the query to be executed from “Grid” to “File”.
Finally click the “Run” button and let DAX Studio Export your data containing at least 30,000 rows to a CSV file. Simple!
I hope this helps with any debugging you might need to perform in Power BI Desktop!
Which date format styles should we use if we are building a report that is being consumed internationally?
Remember, 01/12/2021 is December 1st or January 12th depending in which part of the world it is being read.
The decision may be taken from our hands if there is a company policy in place. If the company is based in the USA, for example, they may choose to use US formatted date fields as a standard for reporting across the entire business, however, if the field needs to be truly dynamic depending on the consumers location, the answer lies in this tool tip:
There are 2 formats in the selection that are prefixed with an asterisk:
There are 2 variables that the Power BI Service checks when loading reports in the service.
First it will check the language setting of the user account in the service. This is set under ‘Settings >> General >> Language’. There is a dropdown option that acts as both a language and regional setting, this drives how dates are formatted when dynamic date formats are used.
If this is set to ‘Default (browser language)’ the second variable, the browser’s default language setting, will take effect.
In Edge this is set under ‘Settings >> Language’, when multiple languages are set, the topmost one is considered the default.
In Chrome it is set under ‘Settings >> Advanced >> Language’, this uses the same system as Edge where the topmost language is used as default.
Here is an example of a table loaded in a browser using both English UK and English US:
This example shows that not only does the format of the date itself change (day and month have switched) but there are also visual connotations to account for. The US format uses a 12-hour clock by default and the addition of the AM/PM suffix changes the column width and drastically alters the readability of the table and potentially the entire report. It is these occurrences we need to be aware of when developing reports for international consumption.
This issue can easily be avoided by using the ‘Auto-size column width’ setting under ‘Column Headers’ on the formatting tab of the visual, or by allowing for the growth when setting manual column widths. (For a great guide on manually setting equal column widths, please read this helpful post by my colleague, Nick Edwards)
Unfortunately, this post comes with a caveat, at the time of writing it would seem there is a bug in Power BI. Remember this from earlier?
As you can see below, both fields use the UK format of DD/MM/YYYY when the browser language is set to English UK.
However, when the browser settings are changed to English US, only the *‘General Date’ format has changed, the *’DD/MM/YYYY’ format is still showing in the UK format even though there is an asterisk next to it in the selection list.
Hopefully once this issue is addressed, the use of regionally dynamic date formats will be available for both long and short formats.
In Power BI Power Query there is an option to enable or disable whether a table is loaded into the report. The option ‘Enable load’ can be found by right clicking on the table. Typically, by default, the load is already enabled.
There is also an option ‘Include in report refresh’ which lets a user stop a table from refreshing when they refresh the entire report. This maybe useful for static tables or tables that are large which take a long time to refresh and a user wants to concentrate on how other tables are refreshing.
Once a user disables the ‘Enable load’ option, the table name turns italic which is an easy way for users to determine whether a table will be loaded or not.
After applying these changes, no data has been loaded into the report.
To re-enable the load, jump back into Power Query, right click on the table and ‘Enable load’.
Finally, some scenarios where it might be useful to disable loading a table:
– Disable loading tables in Power Query that were only ever stepping stones to create other tables
– See how removing a table effects your report before deleting it
– Removing a table that might be required again in the future
A drill through in Power BI allows the reader to see secondary data related to the original page with the context of a specific data point applied, for example, drilling through on sales data can display the demographic information of the relevant customers for those sales.
One limitation of the drill through functionality is that it only allows users to drill through on a single data point. If more than one is selected, the drill through function will be disabled. Using the above example, this means that a reader can drill through to the demographic of the sales of one product at a time, but not a combination of two or three.
You can see this when using a drill through button, the button only works when one data point is selected.
If you select multiple points the button is greyed out and if you hover over the it, you get the following tool tip appear:
“To drill through to [page name], select a single data point from [page name]“
Curiously, since native drill throughs on card visualisations were introduced back in September 2020, Power BI considers a card to be a single data point, regardless of the number of filters applied to it.
If you drill through on the card with multiple data points selected, the drill through page will have all of the relevant filters applied.
Currently there is no method of getting the button to function with multiple data points selected, even though the above behaviour suggests there is scope to do so. At the time of writing, Microsoft have confirmed that this behaviour is intended functionality for the drill through button.
So to conclude, if you need to allow drill throughs for a multi-select scenario, currently your only option at the moment is to replace Buttons for Cards and perhaps include a tip for the reader to know its there, hopefully this may change in the future.
I have created a Power BI report which provides detail on the state of processing in a Tabular Cube.
The report uses the cube’s dynamic management views to provide information about the cube’s partitions.
To use the tabular cube processing report, you need to insert the following information:
- Database (Cube Name)
Once entered and assuming the connection is fine you need to accept some native queries. These statements are select statements and will not alter the cube. That should give you a report similar to the one below, I have used an adventure works tabular model as an example.
This report is inspired by one I found for general tabular cube documentation by Data Savvy, which I highly recommend. That report includes information on: tables, columns, relationships, measure and security:
There is a very serious limitation in the behaviour of calculation groups when using the SELECTEDMEASURE or ISSELECTEDMEASURE functions, and we recommend not using them. Why? If a user creates their own custom calculations within their Power BI report (or composite model) then the value of SELECTEDMEASURE changes, breaking your calculation group logic.
Let me explain with an example:
In a tabular cube we may look to use calculation groups to apply additional filters when calculating measures. In our example we will be using a retail scenario, with budgets as well as ‘Like for Like’ (LFL) budgets. For those not in retail, LFL represents stores that were also open this time last year. We start with the measure “Budget” with the following definition:
We also have a hidden measure called “Budget LFL” with the following definition:
The measure “Budget LFL” is a variation of the “Budget” measure with additional filters applied at source. We also have a calculation group called “LFL Filter”, this is used to return the hidden “Budget LFL” measure based on the option selected in this filter and has the following definition:
This functionality is demonstrated below where the two visuals from Power BI show the same measure with and without the LFL Filter applied:
A problem arises when you try to create a custom measure (in the cube or within a Power BI report) and filter by the calculation group, for example we create a measure called “CustomBudget” which is a copy of the “Budget” measure and has the definition:
Adding this custom measure to the visual shown earlier we can see that the calculation group “LFL Filter” has no affect on the custom measure:
This is because the SELECTEDMEASURE() is now [CustomBudget] and not [Budget], therefore the logic in the calculation group doesn’t recognise the selected measure, and therefore doesn’t switch to the LFL measure.
To get around this we move the bulk of the logic from the calculation group to the measure itself. The measure now changes its behaviour by looking at the selected value of the LFL filter, instead of the calculation group managing it centrally:
We refresh Power BI and find that the results are now as expected, the original and custom measures now match when using the calculation group as a filter:
In this blog post we’ll take a quick look at using ConcatenateX function to view a concatenated string of dates where the max daily sales occurred for a given month.
I came across this function whilst going through the excellent “Mastering DAX 2nd Edition Video Course” by the guys from SQLBI.com. So credit to Marco and Alberto for sharing this.
So how does it work? If we had a list of dates ranging from 01/01/2020 to 31/12/2020 and we wanted to see which days we achieved maximum sales for each given month in a year we could use the ConcatenateX function to return these dates in a single row per month.
As we can see in the screenshot below, the left hand table shows the month of June where we achieved maximum sales for in June on both 18/06/2020 and 25/06/2020 of 99. In the table to the right we can see those two dates presented on a single row for the month of June in the column “What were the max days?”. This was column was created using the ConcatenateX function!
So let us first look at what the maximum daily sales were per month. To do this we’ll use the MAXX function to create “Max Daily Sales”. This returns the maximum daily sales rate achieved for each given month as a single value. So for the month of June this would be 99. The problem with this is we are not sure which days these max sales were achieved on without drilling down into the data. Was this just one day or was it multiple days? All we can see is a figure of 99.
So let us create a new measure to work out on which days this figure of 99 occurred on.
The variable at point (a) returns a table with a single column which lists all of the unique dates in our Sales_2020 table.
The variable at point (b) returns the max sales for the given filter context in this case month.
The variable at point (c) uses the filter function to filter out only the days where the max sales were achieved by setting the total quantity sold measure to the max daily sales variable. For example in June we achieved max daily sales of 99 on 18/06/2020 and 25/06/2020. Therefore the variable at point (c) would filter out the ListOfDays table variable to just 18/06/2020 and 25/06/2020 only.
If we just had one max sales day per month we could simply return MaxDaysOnly. However we may have multiple days per month where max sales were achieved. Hence we use the ConcatenateX function to create a string of dates.
The variable at point (d) creates a string of concatenated dates separated by the delimiter “,” which can be used against a single row in a table.
Wrapping it all up returns us this table below, which shows us which days max sales were achieved per given month! Pretty cool eh?
Check out https://www.sqlbi.com/articles/mastering-dax-video-course-2nd-edition/ for more information regarding the online DAX course as well as https://twitter.com/marcorus and https://twitter.com/ferrarialberto on Twitter!