Monthly Archives: May 2022
So, you have just written a query, hit execute and you have encountered an error: Invalid column name ‘[column name]‘.
The column you’ve used in your WHERE clause cannot be identified by its alias. You’ve defined it at the top of the query and used it fine previously as your ORDER BY condition, so why can’t the engine recognise it?
If you’ve ever written a SQL query, this should look familiar:
- GROUP BY
- ORDER BY
It makes sense that we read the query from top to bottom / left to right (especially for those in the West), but contrary to how it may look, this isn’t how the query is executed. The order of execution is:
- FROM / JOIN
- WHERE / ON
- GROUP BY
- ORDER BY
When you stop and think about what is happening, logically, this order makes sense.
FROM / JOIN – First and foremost, we need to determine the location of the data set that we are querying.
WHERE / ON – We need to narrow that data set down with predicates to keep only the records we care about.
GROUP BY – If you need to aggregate the results, this is done next to give us an even more concise result set.
HAVING – WHERE 2.0! This is a predicate over the aggregated result set.
SELECT – Okay, now we have our final result set, containing ONLY our required rows. Now, finally, we can select which columns we want from that dataset.
DISTINCT – Now we know which columns we want from the result set, we can strip out any duplicates.
ORDER BY – Finally, we have whittled the result set down to only the rows and columns we need, so we can order it however we want.
What is the value in knowing this?
It’s important to understand the difference between the logical and lexical order of a query. Knowing this will help you troubleshoot any syntax errors you may encounter.
Going back to our original issue of using an alias in a WHERE clause, you can now see why you get an error about an invalid column.
Look back at the order of execution, aliases are assigned to columns during the SELECT stage, WHERE comes before SELECT. The engine is trying to process the column by its alias before it’s been assigned. ORDER BY comes after SELECT so the same issue doesn’t arise here
It also explains why Intellisense doesn’t autocomplete or suggest column names until you’ve added the FROM statement. Until you tell the engine where the data is coming FROM, it doesn’t know which columns it can SELECT.
At the time of writing, it is not possible to write a query using a CTE in the source of a dataflow. However, there are a few options to deal with this limitation:
- re-write the query using subqueries instead of CTEs
- use a stored procedure that contains the query and reference the stored proc in the source of the dataflow
- write the query as a view and reference the view in the source of the dataflow (this is my preferred method and the one I will demo here)
I will use the following query purely as an example to demo this:
This query produces the following output:
If I write this query directly inside the source of the data flow as shown below, when trying to import the schema or preview the data I get the error message Incorrect syntax near the keyword ‘WITH’
However, if I create a view for this query and reference the view in the dataflow instead, this works and I can preview the data:
Note, in the source you can also write a query referencing the view (shown below), this is useful if you require additional logic on top of the view.
As mentioned earlier, you can also write the query in a stored procedure and reference this in a similar way to the above. You should now be able to use this simple method to use CTEs in the source for a dataflow.