Monthly Archives: September 2021
To understand how a filter can affect performance, first we need to understand what a filter is.
A filter is a temporary table that is created on demand, held in memory, and then used as a reference for the function being filtered. The rows of the temporary filter table are all DISTINCT. This is because for the purposes of filtering, the engine doesn’t care whether there are 1, 2, 3 or 250 occurrences of a value in table, only whether it exists or not.
Let’s take a look at some DAX:
This measure is adding together all the values of Sales Amount, but only where the List Price of the product being sold is greater than 100.
What we are asking the DAX engine to do is:
- Create a temporary table of product list prices where list price is > 100.
- Iterate through the FactInternetSales table and grab all the SalesAmount values from those records where the related product has a list price that appears in the temporary table.
- Add all the SalesAmount values together.
While this measure will work and produce the expected results, it isn’t the most efficient way of doing things.
This FILTER function is populating the temporary table with every column per row of DimProduct that meets the criteria of DimProduct[ListPrice] > 100, that table will look something like this:
For our predicate of DimProduct[ListPrice] > 100 to work, we only need to check one column, List Price, yet we are pulling every column into memory unnecessarily. As we are including every column, including the ProductKey, every row will be distinct regardless of whether a specific list price has already been found on another record.
This means the table will contain more columns and rows than we need. The wider and longer this table, the more memory we are taking up with data we don’t need to perform the filter.
So, is there a better approach?
The use of ALL in the FILTER (Line 5 above) means we can specify the column(s) we want to filter by rather than the entire table, so the temporary table held in memory is now only 1 column wide.
The predicate still has everything it needs to function, a temporary table of distinct list prices with which to cross-reference DimProduct.
Remember, a temporary filter table always contains distinct rows. Now that we only have one column, where there are duplicate list prices we will only have one row for each. A shorter and narrower table will consume a lot less memory than the wider, longer one created by the previous query.
Below is a table showing both measures produce the same results.
A smaller temporary table means less memory usage and less to scan through, which in turn equates to more speed as seen here in the Performance Analyzer:
What if there are global filters in place?
The use of ALL means that we are removing any global filters from the measure’s filter context, so any page filters or slicers will be ignored. If you want these to remain in effect, simply wrap the filter in a KEEPFILTERS function, this will allow the global filters to remain but still allow you to only pull one column into memory.
While this may seem trivial at first glance, the performance analyser shows the speed increase between the two queries to be more than double. If your dataset contains very wide tables with millions of rows and your report pages contain a lot of visuals with lots of measures (CALCULATE being one of the most popularly used functions), this speed increase will scale to be meaningful and noticeable by your end users.
Variables can simplify your DAX code, help with debugging and help improve performance. To use variables in your DAX measure you need to declare a variable using the VAR keyword, give the variable a name, and then assign an expression to the variable. You can use multiple variables in a measure but when using variables you must use a RETURN statement to specify the final output to be returned.
To show how to make use of variables I’ll go through a simple measure and rewrite it using variables, I’ll be using the Adventure Works Internet Sales tabular cube for this.
A measure called Total Sales YoY Growth % (shown below) has been created which calculates sales for current year and the previous year, then works out the percentage increase/decrease from the previous year to the current year.
You can see there is repeated logic for the CALCULATE block and this means it is evaluated twice, notice that SUM(‘Internet Sales'[Sales Amount]) is referenced 3 times. Using variables for the repeated logic gives the following code:
In this version the result of SUM(‘Internet Sales'[Sales Amount]) has been assigned to the variable TotalSales and the result of the CALCULATE section has been assigned to the variable TotalSalesPP. The CALCULATE section is now evaluated only once and assigned to a variable, this is a basic example so performance gain is insignificant but this method will help the performance of more complex measures.
To simplify the code further a new variable called TotalSalesVariance is created, and a final variable called Result is created to store the final calculation to be returned. The final code then becomes:
This version then allows you to debug the measure by using any of the variables in the RETURN statement, making it possible to verify the values for each of the variables:
The one thing to look out for with variables is filter context, once a variable has been declared and a value assigned to it you cannot use a CALCULATE statement to override the filter context and get a new value using the variable.
For example, TotalSalesPP = CALCULATE(TotalSales, PARALLELPERIOD(‘Date'[Date], -12, MONTH)) would return the same value as the variable TotalSales, hence TotalSalesPP = CALCULATE(SUM(‘Internet Sales'[Sales Amount]), PARALLELPERIOD(‘Date'[Date], -12, MONTH)) is the correct way to write this.
Hopefully this blog will help you introduce variables into your DAX code!