0845 643 64 63

Joe Billingham

The SQL Query Alias Conundrum – Order of Execution

Falling Dominoes

So, you have just written a query, hit execute and you have encountered an error: Invalid column name ‘[column name]‘.

The column you’ve used in your WHERE clause cannot be identified by its alias. You’ve defined it at the top of the query and used it fine previously as your ORDER BY condition, so why can’t the engine recognise it?

If you’ve ever written a SQL query, this should look familiar:

  1. SELECT
  2. DISTINCT
  3. FROM
  4. JOIN
  5. ON
  6. WHERE
  7. GROUP BY
  8. HAVING
  9. ORDER BY

It makes sense that we read the query from top to bottom / left to right (especially for those in the West), but contrary to how it may look, this isn’t how the query is executed. The order of execution is:

  1. FROM / JOIN
  2. WHERE / ON
  3. GROUP BY
  4. HAVING
  5. SELECT
  6. DISTINCT
  7. ORDER BY

When you stop and think about what is happening, logically, this order makes sense.

FROM / JOIN – First and foremost, we need to determine the location of the data set that we are querying.

WHERE / ON – We need to narrow that data set down with predicates to keep only the records we care about.

GROUP BY – If you need to aggregate the results, this is done next to give us an even more concise result set.

HAVING – WHERE 2.0! This is a predicate over the aggregated result set.

SELECT – Okay, now we have our final result set, containing ONLY our required rows. Now, finally, we can select which columns we want from that dataset.

DISTINCT – Now we know which columns we want from the result set, we can strip out any duplicates.

ORDER BY – Finally, we have whittled the result set down to only the rows and columns we need, so we can order it however we want.

Question Mark

What is the value in knowing this?

It’s important to understand the difference between the logical and lexical order of a query. Knowing this will help you troubleshoot any syntax errors you may encounter.

Going back to our original issue of using an alias in a WHERE clause, you can now see why you get an error about an invalid column.

Look back at the order of execution, aliases are assigned to columns during the SELECT stage, WHERE comes before SELECT. The engine is trying to process the column by its alias before it’s been assigned. ORDER BY comes after SELECT so the same issue doesn’t arise here

It also explains why Intellisense doesn’t autocomplete or suggest column names until you’ve added the FROM statement. Until you tell the engine where the data is coming FROM, it doesn’t know which columns it can SELECT.

Combining Queries from Multiple Sources in Power BI using Merge and Append

It is always good practice to do as much data preparation as close to the sources as you can before importing or connecting them to your Power BI reports, but what if there are circumstances where this isn’t possible?

I had an issue recently where a third-party application had been updated and both the new and legacy versions were being used side-by-side. Logging data from both versions was being written to two separate Azure SQL databases.

The customer needed a Power BI report showing both old and new logging data sets as a single source. If both databases were SQL Server databases, I could have written a view with a cross-database join, imported that into Power BI and thought no more about it. However, the two sources being Azure SQL Databases, with no easy way to join the tables, caused an issue.

This is where the Merge and Append functions in Power Query come in.

The first step to solve my issue was to create a view in each of the Azure databases that were identical in structure (you’ll see why later on) and import these into Power BI.

Now these data sources (referred to as ‘Queries’ by Power BI) have both been imported into the data model, we have two options with regards to how to combine them, ‘Append’ and ‘Merge’.

Merge
Although I didn’t use the ‘Merge’ function, I have included some information about it here as it is still relevant. ‘Merge’ is useful when you have columns from one source that you would like to add to another, the simplest way to think about it is that it works in the same way a JOIN works in SQL, in fact, when you enter the ‘Merge’ wizard there is a ‘Join Kind’ option:

This is how the ‘Merge’ function works:

Append
To solve my issue (explained above), I used the ‘Append’ function. While still combining 2 sources it concatenates one query with another, it’s SQL equivalent would be a UNION ALL. ‘Append’ gives you the option to combine as many tables as you wish, regardless of structure. If a column exists in one query but not another, the column will be filled with NULLS where applicable, this is why it was important to create the two identical views at the very beginning.

This is how the ‘Append’ function works:

When I appended the two sources, I chose the option to create ‘as New’ so I can hide the original 2 queries and I have a nice new table (as seen below) which I can rename, tidy up and make a bit more user friendly, ready for the report consumer to use.

As previously mentioned, data preparation should be done as close to the source as possible, but in situations where this is difficult or simply isn’t possible, it’s important to know about these Power BI functions and how you can use them to your advantage.


More information can be found here:
Append queries – Power Query | Microsoft Docs
Merge queries overview – Power Query | Microsoft Docs

Making the DAX Engine Work for You

In my last blog post which can be found here, I demonstrated that it’s important to limit your DAX filters to specific columns rather than entire tables, now we know that we can and should do it, let’s explore why and how it works.

There are numerous ways to write an expression that will give the same result set, but why are some ways more efficient than others? To answer that we need to know a little about what happens under the hood, inside the DAX engine(s).

A lot goes into processing a DAX expression but the focus of this blog post will be the Formula Engine (FE) and the Storage Engine (SE).

Both engines do different jobs, think of the FE as the brain, it provides the instructions, and the SE as the muscles, it does the fetching and carrying.

The FE takes a DAX expression and turns it into a set of sequential instructions, (joins, aggregations, filters, etc). It then sends these instructions to the SE. Due to the single-thread nature of the FE to SE communication, the instructions are sent to the SE sequentially. Therefore, the more instructions sent, the longer the entire process takes.

The SE takes these instructions and executes them against the data source(s), collects what is requested and delivers it back to the FE. Unlike the FE, the SE uses multi-threading so can process the FE’s requests very quickly.

So now that we know a little about the FE and SE and what roles they perform, how can we use it to our advantage?

The SE can do a lot of complex work very quickly, but if we can limit the amount of instructions it has to carry out, the quicker the overall process becomes. This is best shown with an example.

*The queries below are being run in DAX studio so the stats can be recorded alongside the result set

Here is a simple expression to select the total amount of sales, per region, taken from everyone’s favourite imaginary online bike store:

DEFINE MEASURE Sales[Count] =
CALCULATE(
DISTINCTCOUNT('Sales Order'[Sales Order]
)
)
EVALUATE
CALCULATETABLE(SUMMARIZECOLUMNS(Customer[Country-Region], "Sales by Region", Sales[Count]))

As you can see, the FE takes the DAX expression and turns it into 1 single query. The instruction is sent, the SE does the hard work and passes the result back. The entire query took 44ms to return 7 rows, one per region. Nice and fast.

However, if we add a filter to the query and use the entire Sales table (highlighted in red below) in the filter, look what happens:

DEFINE MEASURE Sales[Count] =
CALCULATE(
DISTINCTCOUNT('Sales Order'[Sales Order]),
FILTER(
Sales, Sales[Sales Amount] > 350
&& Sales[OrderDateKey] > 20200101
)
)
EVALUATE
CALCULATETABLE(SUMMARIZECOLUMNS(Customer[Country-Region], "Sales by Region", Sales[Count]))

The expression now takes 8 times longer to complete. The reason for this is the FE sends the SE 8 queries, and remember, it must do these sequentially.

1x The SE fetches the table for the filter (Sales) and trims it down to only the rows that meet the filter criteria:
a. Sales Amount > 250
b. OrderDateKey > 20200101

7x That filtered table, which is now held in memory, is applied across 7 more queries, one query per region in the Customer table.

There are 8 times as many instructions sent to the SE and the whole process takes around 8 times as long to complete.

With a little knowhow we can rewrite the expression in a more efficient way, using only the specific columns in the filter (again, in red). This enables the FE to parse it into a single instruction and ping it off to the SE.

DEFINE MEASURE Sales[Count] =
CALCULATE(
DISTINCTCOUNT('Sales Order'[Sales Order]),
FILTER(
ALL(Sales[Sales Amount]), Sales[Sales Amount] > 350),
FILTER(
ALL(Sales[OrderDateKey]), Sales[OrderDateKey] > 20200101)
)
EVALUATE
CALCULATETABLE(SUMMARIZECOLUMNS(Customer[Country-Region], "Sales by Region", Sales[Count]))

The SE receives 1 instruction, produces the same result set and the entire process takes a fraction of the time.

Although the technology behind DAX is very quick, learning the most efficient ways to write expressions can give that technology a helping hand and make your expressions, and therefore your Power BI reports, just that bit quicker.

If you need any help or advice with your DAX expressions or Power BI reports in general, feel free to leave a comment or drop me or any of the team an email with your queries, we would love to hear from you.

Increasing DAX Filter Speed and Efficiency

To understand how a filter can affect performance, first we need to understand what a filter is.

A filter is a temporary table that is created on demand, held in memory, and then used as a reference for the function being filtered. The rows of the temporary filter table are all DISTINCT. This is because for the purposes of filtering, the engine doesn’t care whether there are 1, 2, 3 or 250 occurrences of a value in table, only whether it exists or not.

Let’s take a look at some DAX:

This measure is adding together all the values of Sales Amount, but only where the List Price of the product being sold is greater than 100.

What we are asking the DAX engine to do is:

  1. Create a temporary table of product list prices where list price is > 100.
  2. Iterate through the FactInternetSales table and grab all the SalesAmount values from those records where the related product has a list price that appears in the temporary table.
  3. Add all the SalesAmount values together.

While this measure will work and produce the expected results, it isn’t the most efficient way of doing things.

This FILTER function is populating the temporary table with every column per row of DimProduct that meets the criteria of DimProduct[ListPrice] > 100, that table will look something like this:

For our predicate of DimProduct[ListPrice] > 100 to work, we only need to check one column, List Price, yet we are pulling every column into memory unnecessarily. As we are including every column, including the ProductKey, every row will be distinct regardless of whether a specific list price has already been found on another record.

This means the table will contain more columns and rows than we need. The wider and longer this table, the more memory we are taking up with data we don’t need to perform the filter.

So, is there a better approach?

The use of ALL in the FILTER (Line 5 above) means we can specify the column(s) we want to filter by rather than the entire table, so the temporary table held in memory is now only 1 column wide.

The predicate still has everything it needs to function, a temporary table of distinct list prices with which to cross-reference DimProduct.

Remember, a temporary filter table always contains distinct rows. Now that we only have one column, where there are duplicate list prices we will only have one row for each. A shorter and narrower table will consume a lot less memory than the wider, longer one created by the previous query.

Below is a table showing both measures produce the same results.

A smaller temporary table means less memory usage and less to scan through, which in turn equates to more speed as seen here in the Performance Analyzer:

What if there are global filters in place?

The use of ALL means that we are removing any global filters from the measure’s filter context, so any page filters or slicers will be ignored. If you want these to remain in effect, simply wrap the filter in a KEEPFILTERS function, this will allow the global filters to remain but still allow you to only pull one column into memory.

While this may seem trivial at first glance, the performance analyser shows the speed increase between the two queries to be more than double. If your dataset contains very wide tables with millions of rows and your report pages contain a lot of visuals with lots of measures (CALCULATE being one of the most popularly used functions), this speed increase will scale to be meaningful and noticeable by your end users.

How the UNICHAR() DAX Function Enhances Power BI Reports

The UNICHAR() DAX function is a text function that takes a numerical Unicode value and displays its associated character. For example, UNICHAR(128515) will display as:

90% of the information the human brain processes is visual and we process images up to 60,000 times faster than text, so it makes perfect sense to use icons where possible to enhance reports. This scarcely used DAX function opens-up that option.

The below stacked column chart uses Unicode emoticons to enhance the readability of the ‘Genre’ axis labels.

So, how do we achieve this?

To produce this you will need to edit the query. In the ‘Data’ view, right click the relevant table and select “Edit Query”

First, duplicate the existing column you want Unicode characters for (genre in this case). Then use the ‘Replace Values’ option to substitute in the relevant Unicode numbers for each genre.

(this can be hidden from the report view as it contains nothing meaningful).

Next, create a second calculated column that uses a simple measure:

IconColumn = (UNICHAR(UnicodeNumberColumn))

This new ‘Icon’ column can now be used in reports the same way as any other text column.

Note how in the stacked column chart above, the original names have been included, this is good practice for two main reasons. One is clarity, a clown denotes comedy to most users, but could indicate horror to others, including the label removes the ambiguity.

The other reason is due to possible compatibility issues. It is worth pointing out here that the Unicode characters will only display when the character exists in the chosen font. In most cases this will be fine, especially for emoji characters, but just in case there are display issues it is worth including the full label.

Staying with the movie topic, the below chart shows movie ratings both numerically and visually created by a custom measure:

Stars = REPT(UNICHAR(11088), AVERAGE('IMDB 1000'[10 Star Rating]))

A measure that uses the UNICHAR() function will always be a text field and as such, normal formatting applies, in the example above we can set colours to be gold on a black background.

The previous examples do help readability but don’t really add anything meaningful to the report. The below table shows that the UNICHAR() function can add worthwhile content with customisable KPIs by combining it with conditional formatting.

There are 143,859 Unicode characters available, everything from emojis, symbols, shapes and braille patterns to dice and playing cards. Whether you want to offer further insight into your data, enhance the user experience or simply create something sublimely ridiculous, with so many icons at your fingertips, the possibilities are only limited by your imagination.

Further information on the UNICHAR() function can be found here: UNICHAR function (DAX) – DAX | Microsoft Docs
A list of Unicode characters and their respective numerical values can be found here: Huge List of Unicode Characters

Dynamic Date Formats in Power BI

Which date format styles should we use if we are building a report that is being consumed internationally?

Remember, 01/12/2021 is December 1st or January 12th depending in which part of the world it is being read.

The decision may be taken from our hands if there is a company policy in place. If the company is based in the USA, for example, they may choose to use US formatted date fields as a standard for reporting across the entire business, however, if the field needs to be truly dynamic depending on the consumers location, the answer lies in this tool tip:

Explanation of dynamic date formats

There are 2 formats in the selection that are prefixed with an asterisk:

Selection of dynamic date formats
* We shall use ‘General Date’ in the examples throughout this post for reasons explained later

There are 2 variables that the Power BI Service checks when loading reports in the service.

First it will check the language setting of the user account in the service. This is set under ‘Settings >> General >> Language’. There is a dropdown option that acts as both a language and regional setting, this drives how dates are formatted when dynamic date formats are used.

Power BI service language settings

If this is set to ‘Default (browser language)’ the second variable, the browser’s default language setting, will take effect.

In Edge this is set under ‘Settings >> Language’, when multiple languages are set, the topmost one is considered the default.

Language settings in Edge

In Chrome it is set under ‘Settings >> Advanced >> Language’, this uses the same system as Edge where the topmost language is used as default.

Language settings in Chrome

Here is an example of a table loaded in a browser using both English UK and English US:

English UK
English US

This example shows that not only does the format of the date itself change (day and month have switched) but there are also visual connotations to account for. The US format uses a 12-hour clock by default and the addition of the AM/PM suffix changes the column width and drastically alters the readability of the table and potentially the entire report. It is these occurrences we need to be aware of when developing reports for international consumption.

This issue can easily be avoided by using the ‘Auto-size column width’ setting under ‘Column Headers’ on the formatting tab of the visual, or by allowing for the growth when setting manual column widths. (For a great guide on manually setting equal column widths, please read this helpful post by my colleague, Nick Edwards)

Unfortunately, this post comes with a caveat, at the time of writing it would seem there is a bug in Power BI. Remember this from earlier?

Explanation of dynamic date formats
Selection of dynamic date formats

As you can see below, both fields use the UK format of DD/MM/YYYY when the browser language is set to English UK.

Settings set to UK
UK dates

However, when the browser settings are changed to English US, only the *‘General Date’ format has changed, the *’DD/MM/YYYY’ format is still showing in the UK format even though there is an asterisk next to it in the selection list.

Settings set to US
Erroneous mix of US and UK dates

Hopefully once this issue is addressed, the use of regionally dynamic date formats will be available for both long and short formats.

Power BI Drill Through using Multiple Data Points

A drill through in Power BI allows the reader to see secondary data related to the original page with the context of a specific data point applied, for example, drilling through on sales data can display the demographic information of the relevant customers for those sales.

One limitation of the drill through functionality is that it only allows users to drill through on a single data point. If more than one is selected, the drill through function will be disabled. Using the above example, this means that a reader can drill through to the demographic of the sales of one product at a time, but not a combination of two or three.

You can see this when using a drill through button, the button only works when one data point is selected.

Single data point selected - Button active

If you select multiple points the button is greyed out and if you hover over the it, you get the following tool tip appear:

Multiple data points selected - Button greyed out

“To drill through to [page name], select a single data point from [page name]

Curiously, since native drill throughs on card visualisations were introduced back in September 2020, Power BI considers a card to be a single data point, regardless of the number of filters applied to it.

If you drill through on the card with multiple data points selected, the drill through page will have all of the relevant filters applied.

Select multiple data points and right click the card to drill through
Filters showing both selected data points have been applied

Currently there is no method of getting the button to function with multiple data points selected, even though the above behaviour suggests there is scope to do so. At the time of writing, Microsoft have confirmed that this behaviour is intended functionality for the drill through button.

So to conclude, if you need to allow drill throughs for a multi-select scenario, currently your only option at the moment is to replace Buttons for Cards and perhaps include a tip for the reader to know its there, hopefully this may change in the future.

Power BI Sentinel
The Frog Blog

Team Purple Frog specialise in designing and implementing Microsoft Data Analytics solutions, including Data Warehouses, Cubes, SQL Server, SSIS, ADF, SSAS, Power BI, MDX, DAX, Machine Learning and more.

This is a collection of thoughts, ramblings and ideas that we think would be useful to share.

Authors:

Alex Whittles
(MVP)
Jeet Kainth
Jon Fletcher
Nick Edwards
Joe Billingham
Lewis Prince
Reiss McSporran
Microsoft Gold Partner

Data Platform MVP

Power BI Sentinel
Frog Blog Out
twitter
rssicon