0845 643 64 63

SQL Server

The SQL Query Alias Conundrum – Order of Execution

Falling Dominoes

So, you have just written a query, hit execute and you have encountered an error: Invalid column name ‘[column name]‘.

The column you’ve used in your WHERE clause cannot be identified by its alias. You’ve defined it at the top of the query and used it fine previously as your ORDER BY condition, so why can’t the engine recognise it?

If you’ve ever written a SQL query, this should look familiar:

  1. SELECT
  2. DISTINCT
  3. FROM
  4. JOIN
  5. ON
  6. WHERE
  7. GROUP BY
  8. HAVING
  9. ORDER BY

It makes sense that we read the query from top to bottom / left to right (especially for those in the West), but contrary to how it may look, this isn’t how the query is executed. The order of execution is:

  1. FROM / JOIN
  2. WHERE / ON
  3. GROUP BY
  4. HAVING
  5. SELECT
  6. DISTINCT
  7. ORDER BY

When you stop and think about what is happening, logically, this order makes sense.

FROM / JOIN – First and foremost, we need to determine the location of the data set that we are querying.

WHERE / ON – We need to narrow that data set down with predicates to keep only the records we care about.

GROUP BY – If you need to aggregate the results, this is done next to give us an even more concise result set.

HAVING – WHERE 2.0! This is a predicate over the aggregated result set.

SELECT – Okay, now we have our final result set, containing ONLY our required rows. Now, finally, we can select which columns we want from that dataset.

DISTINCT – Now we know which columns we want from the result set, we can strip out any duplicates.

ORDER BY – Finally, we have whittled the result set down to only the rows and columns we need, so we can order it however we want.

Question Mark

What is the value in knowing this?

It’s important to understand the difference between the logical and lexical order of a query. Knowing this will help you troubleshoot any syntax errors you may encounter.

Going back to our original issue of using an alias in a WHERE clause, you can now see why you get an error about an invalid column.

Look back at the order of execution, aliases are assigned to columns during the SELECT stage, WHERE comes before SELECT. The engine is trying to process the column by its alias before it’s been assigned. ORDER BY comes after SELECT so the same issue doesn’t arise here

It also explains why Intellisense doesn’t autocomplete or suggest column names until you’ve added the FROM statement. Until you tell the engine where the data is coming FROM, it doesn’t know which columns it can SELECT.

Automate changing SSIS connection at runtime

Recently a customer came to us with an issue: “We have a primary/secondary AlwaysOn failover cluster, we want the data warehouse ETL to always pull from the secondary read-only replica. When the primary fails over, how can we detect this and repoint the ETL to the new secondary?”

This led us to the following qualification steps:
1. Is the secondary server accessible?
a. Yes – Use it.
b. No – Try the primary server.
2. Is the primary server accessible?
a. Yes – Use it.
b. No – Fail SQL Agent Job.

This sounds simple enough, but how do we handle this in a SQL Agent Job? Using the built-in SSISDB Store Procedures! (Links at the end for further reading on the Stored Procedures used).

Step 0: Ensure that your Connection has a parameterised option for the ServerName

Step 1: Set the SQL connection manager to the SECONDARY server. (This is done to always try this server first).

DECLARE @var sql_variant = N’1.1.1.1′ –Replace 1.1.1.1 with your SECONDARY destination IP/Servername
EXEC [SSISDB].[catalog].[set_object_parameter_value]
@object_type=20,
@parameter_name=N’CM.XXX.ServerName’, –Replace XXX with the name of your connection
@object_name=N’XXX’, –Replace XXX with the name of your SSIS Package or Project for the level of connection
@folder_name=N’XXX’, –Replace XXX with the name of the folder your SSIS Project is deployed into
@project_name=N’XXX’, –Replace XXX with the name of your SSIS Project
@value_type=V,
@parameter_value=@var
GO

Step 2: Check the connection can access the database. (This can be any query with a pass/fail. In this instance, we want to check whether the database is a writeable replica, but this could be a simple query that will fail if not available).

IF EXISTS(SELECT 1 FROM master.sys.databases WHERE [name] = ‘XXX’) –Replace XXX with the database name
BEGIN
IF NOT (DATABASEPROPERTYEX(‘XXX’, ‘Updateability’) <> ‘READ_WRITE’) –Replace XXX with the database name
THROW 51000, ‘This is not a writeable replica’, 1;
END

Step 3: Set the SQL connection manager to the PRIMARY server.

DECLARE @var sql_variant = N’1.1.1.1′ –Replace 1.1.1.1 with your PRIMARY destination IP/Servername
EXEC [SSISDB].[catalog].[set_object_parameter_value]
@object_type=20,
@parameter_name=N’CM.XXX.ServerName’, –Replace XXX with the name of your connection
@object_name=N’XXX’, –Replace XXX with the name of your SSIS Package or Project for the level of connection
@folder_name=N’XXX’, –Replace XXX with the name of the folder your SSIS Project is deployed into
@project_name=N’XXX’, –Replace XXX with the name of your SSIS Project
@value_type=V,
@parameter_value=@var
GO

Step 4: Check the connection can access the database. (Again this primary server should now be a writeable replica in our case).

IF EXISTS(SELECT 1 FROM master.sys.databases WHERE [name] = ‘XXX’) –Replace XXX with the database name
BEGIN
IF NOT (DATABASEPROPERTYEX(‘XXX’, ‘Updateability’) <> ‘READ_WRITE’) –Replace XXX with the database name
THROW 51000, ‘This is not a writeable replica’, 1;
END

Step 5: Email to say primary server used for information.

DECLARE @EmailBody NVARCHAR(1024)
SET @EmailBody = CONVERT(varchar(10), GETDATE(), 103) + ‘ – the ETL was pointed to the primary version when the server check was run’

EXEC msdb.dbo.sp_send_dbmail
@profile_name = ‘XXX’, –Replace XXX with the name of your SQL mail Profile
@recipients = ‘XXX’, –Replace XXX with the list of email addresses you’d like to notify
@body = @EmailBody,
@subject = ‘XXX’ ; –Replace XXX with the desired email title

Step 6: Continue SQL Agent job as normal

The flow for this would then be as follows:

STEP 1
Success – Go to Step 2 (Connection set successfully, so test the connection).
Failure – Fail Job

STEP 2
Success – Go to Step 6 (Successfully used the secondary replica, so continue with the SQL Agent Job).
Failure – Go to Step 3 (Unable to use the replica, so try the primary).

STEP 3
Success – Go to Step 4 (Connection set successfully, so test the connection).
Failure – Fail Job

STEP 4
Success – Go to Step 5 (Successfully used the primary, so send notification email).
Failure – Fail Job

STEP 5
Success – Go to Step 6 (Continue SQL Agent Job).
Failure – Go to Step 6 (Because if the email notification fails, we still want to run the rest of the SQL Agent Job).

Microsoft Docs for: set_object_parameter_value

Microsoft Docs for: sp_send_dbmail

Any questions? Let me know in the comments below or contact me on twitter: @PurpleFrogReiss

SQL LAST_VALUE() wrong results

You may have come across the following scenario when using the LAST_VALUE() function.

You want to find the first or last value loaded in a result set, therefore you use the FIRST_VALUE() and LAST_VALUE() functions like below:

You expect to get “A” for every record in the FirstValue column and “E” for every record in the LastValue column.
However you see the following output:

LAST_VALUE() on its own is implemented from the current row going back to the first row, for example:

  • Row 1 – The only value in scope is A.
    Hence, FIRST_VALUE() & LAST_VALUE() both return A.
  • Row 2 – The values now in scope are A & B.
    Hence, FIRST_VALUE() returns A and LAST_VALUE() returns B.
  • Row 3 – The values now in scope are A, B, C.
    Hence, FIRST_VALUE() returns A and LAST_VALUE() returns C.
  • And so on…

To return the actual last value, add the additional clause ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING. This ensures that for each row the LAST_VALUE() function looks at all rows; from the very first row to the very last row. Now, you have the following code (the original code with an extra column using the additional clause):

This gives you the following output, LastValue2 shows the true last value for the result set:

SQL Server 2016 Licensing Changes, SP1, v.Next CTP1, etc.

connect2016By now you’ve probably heard the exciting news announced at today’s Microsoft Connect conference; SQL Server licensing is changing, significantly, from SQL 2016 SP1.

The functionality of SQL Server Standard, Web & Express editions are being brought up to [almost, 99%] match that of Enterprise Edition. This has a number of critical impacts:

  1. Developers can write applications using a single set of SQL Server functionality, and can now make use of previously Enterprise only features without imposing the cost on their customers.
  2. The customer now decides whether to use Express, Web, Std or Ent based on scaleability, not functionality.
  3. It follows the pricing model already used by most cloud services; pay for usage rather than features.
  4. This brings security to the masses – possibly one of the biggest impacts is that now we can use Row Level Security, Dynamic Data Masking and Always Encrypted in ALL editions of the engine – huge news.

sqlserverThere are some limitations to this though:

  1. It currently only applies to the SQL engine, not to SSIS, SSAS or SSRS – Please Microsoft, follow suit with the BI tools asap!!!
  2. Some features are not [yet] available in the Express and/or Local DB editions, including Change Data Capture, In Memory OLTP and Polybase to name a few.
  3. The trigger to move to Enterprise will now be [almost] purely driven by size and performance requirements.

As if that isn’t enough, other huge announcements include:

  • SQL Server 2016 SP1 is now available
  • New functionality in 2016 SP1, including
    • CREATE OR ALTER functionality for Stored Procs, Views, Functions, etc.
    • DBCC CLONEDATABASE – To easily clone a database schema without the data (love this!!)
    • NUMA support for Analysis Services Tabular – I’ll be checking this out in more detail soon and will blog my findings
  • SQL Server v.Next CTP1 (Community Technology Preview) is now available, which includes SQL Server running on Linux! Note that only the engine is available on Linux at this time, not the BI tools but watch this space, the plan is to offer matching functionality on both

 

For full details, check out the Microsoft blog post here

/Frog-Blog-Out

SQL Server Art using Spatial Data

This post follows on from an earlier post on drawing with SQL Server, and explains how to create much more complex drawings using a couple of neat tricks and SQL Server spatial data.

Firstly, apologies to those at my session at SQL Bits to whom I promised this blog post, I did say I’d try and get it posted in a week, and it’s been a month – but it’s here now!

So, what are we trying to do? In my earlier post I demonstrated how to recreate a block drawing by tracing around the points on the edges and converting the results to SQL spatial data coordinates. This is ok if the image is a simple logo, but what if it’s too complex like a photo or sketch? It would take an age to trace so we need a more automated approach.

At this point I’ll make my second apology, to Simon Sabin, who must by now think that Alastair Aitchison and I are stalking him. This post (and my associated lightning talk at SQLBits) derived from finding that Alastair had drawn the SQLBits logo over a month before I did mine. Feeling a little dejected I needed a new project. Simon set us a challenge of improving on it. One thing led to another and both Alastair and I started drawing portraits of Simon, and here we are.

So, lets start with the picture. I chose Simon’s twitter profile pic.

The first step is to convert the bitmap image into a vector image. A bitmap image is a collection of dots whereas a vector image is a collection of lines, better suited to drawing in SQL Server. There’s a great website that takes care of this for you, vectormagic.com and you get two free conversions to try it out. Upload the image, and then download the converted file in EPS format. EPS is ideal for our purposes as it’s a simple text file containing one instruction per line. We can then convert each line into a SQL spatial line.

I found the easiest way of converting the EPS lines into SQL spatial queries is with an Excel spreadsheet (download here). Paste the full contents of the EPS file into column A of the first sheet, the expressions in columns B to N strip out the coordinates and build them into SQL “geometry::STGeomFromText” commands.

You may notice that we’re converting the ‘curveto’ commands in the EPS file into ‘linestring’ commands in SQL. This does result in an approximation of the curves, but this is barely noticeable in drawings like this. Worth noting that SQL Denali is planned to support curves, so we will be able to make it more accurate.

Filter column O to remove the blanks (using the filter in cell O1), then copy the whole column to the clipboard. Then paste it into SQL Server Management Studio.

Then we have to do a little tweaking to make it work.

First we need to define a table variable at the very top of the query window. The first few lines of the query should look like this.

DECLARE @drawing TABLE (geom geometry);

INSERT INTO @drawing (geom) VALUES

(geometry::STGeomFromText(‘LINESTRING (0.00 960.00, 640.00 960.00)’,0))
,(geometry::STGeomFromText(‘LINESTRING (640.00 960.00, 640.00 800.00)’,0))

We can only insert 1000 rows at a time into a table variable, so we need to break up the insert statement up. Every 900 rows or so add a new INSERT statement such as

,(geometry::STGeomFromText(‘LINESTRING (199.39 416.98, 199.07 422.49)’,0))
,(geometry::STGeomFromText(‘LINESTRING (252.58 421.98, 252.87 424.99)’,0))

INSERT INTO @drawing (geom) VALUES

(geometry::STGeomFromText(‘LINESTRING (256.22 430.24, 257.03 426.03)’,0))
,(geometry::STGeomFromText(‘LINESTRING (256.19 417.52, 258.07 416.85)’,0))

Then at the end, we just need to select all rows from the table variable

SELECT * FROM @drawing

If all goes well, you should end up with something like this


You can download the finished SQL script here

SQL Server art at work!

SQL Server User Group – Birmingham

SQL Server User Group MidlandsIt’s only a week to go until the first Midlands SQL Server User Group, being held on March 10th 2011 at the Old Joint Stock pub in Birmingham.

We’ve got two of the best speakers in the UK lined up, Allan Mitchell and Neil Hambly, we’re putting food on for you (pork pies and chip butties!) and it’s being held in a pub so beer will also be involved.

SQL Server, pork pies and beer – all in one place?! Go on, how can you resist?!

Register for FREE here

SQL Server 2008 R2 – PowerPivot and Master Data Services

Purple Frog spent a very interesting day at Microsoft last week, at one of their many events promoting the launch of SQL Server 2008 R2. Rafal Lukewiecki presented an entertaining (as always!) and informative series of talks covering the release, focusing on the enhanced Business Intelligence tools available.

The primary changes to note are

  • Power Pivot – An in memory, client side add-in to Excel, that allows users to create virtual cubes on their desktop and analyse over 100m records of data virtually instantly
  • DAX – A new expression language, designed for non-technical (probably more realistically, semi-technical) users to extend pivot tables and power pivot tables without having to learn MDX
  • Report Components – In a report consisting of a couple of tables, a chart and a few gauges (gauges, sparklines & maps are all new features of SSRS), you can save each element as a component and re-use it in different reports. This should result in much less duplication of work.
  • Report Builder 3 – A thin-client tool allowing end users to create Reporting Services reports. This is a big enhancement over its predecessor s it is finally fully compatible with reports created in the Business Intelligence Development Studio (BIDS), including report components.
  • Master Data Services – A centralised tool and database intended to provide governance of your organisation’s master data (centralised list of products, fiscal calendar, regions etc.).

The enhancements to Reporting Services (SSRS) are very welcome, and should be of huge benefit to anyone either currently using SSRS or considering using it. I firmly believe that there are no comparable web based reporting engines that even come close for SME organisations when looking at the whole picture including cost of implementation, ease of use, flexibility and capability.

Master Data Services as a concept has been around for a long time, but there has never been a tool available to organisations to effectively implement it. This is Microsoft’s first proper stab at delivering a workable solution, and although I’m a big fan of the concept, and have no doubt of its benefit to a SME, I’m yet to be convinced that the tool is ready for a large scale corporate environment. Time will tell how scalable and manageable the system is, and credit has to go to Microsoft for starting the ball rolling.

The most impressive addition is without a doubt PowerPivot. In a nutshell, it’s a user defined OLAP cube wrapped up within Excel 2010, running entirely in memory on a user’s workstation. If you’ve not yet played with it or seen a demo, I’ll try and elaborate for you… Think about loading Excel with 1 million rows, and then imagine sorting and filtering a number of those columns [cue going out to lunch whilst waiting for Excel to catch up]. With PowerPivot, you can sort and filter over 100 million rows of data almost in an instant – it’s very impressive indeed!

That’s the snazzy demo bit, but to reduce it to a glorified spreadsheet is very harsh indeed. It allows a user to import multiple data sources and combine them together into a single dimensional data model, PowerPivot will create your own personal cube, without you having to build a warehouse, without knowing anything about MDX, dimension hierarchies, attribute relationships, granularity etc. etc.

Microsoft’s vision and reason for creating this tool is self-service BI, allowing users to create their own cubes, data analysis environments and reporting systems. And this is where I start to have a problem…

I can’t remember the last time I designed a data warehouse, where I did not find significant data quality problems, conflicting data, missing data, duplicated data etc.. I also find it hard to think of a situation where an end user (even a power user) is sufficiently clued up about the intricacies of a source OLTP database to be able to extract the right data and know what to do with it. Or if they are, a dozen other people in different departments have a different idea about how things work, resulting in many different versions of the truth.

I’m therefore (for now!) sticking with the opinion that it is still absolutely vital for an organisation to provide a clean, consistent, dimensionally modelled data warehouse as the basis for their BI/MI infrastructure. Tools like PowerPivot then sit very nicely on top to provide an incredibly powerful and beneficial user experience, but to try and use the emergence of self-service BI tools to usher in a new ‘non-data warehouse’ era is a very dangerous route which I hope people will avoid.

In summary – this release brings with it a fantastic host of new tools, but with great power comes great responsibility…

SQLBits and Microsoft BI conferences

Following the launch of SQL Server 2008 R2 this week, the ever dedicated SQL Bits team are putting on the next installment tomorrow with SQL Bits VI – ‘the Sixth Sets’. The Purple Froggers will of course be there, hopefully in time for a pre-conference bacon butty!

Also worth mentioning is ‘Microsoft Solutions for Business Intelligence’, a seminar that Microsoft are holding at their Reading campus on May 12 2010. Rafal Lukewiecki is speaking, anyone who has seen him before will know what an entertaining and knowlegable presenter he is. Whether you are currently implementing a BI solution or are just ‘BI curious’, a technical developer or a business manager, the day will provide an insight into the direction BI is heading in, and what we can expect in the years to come.

We hope to see you at either/both!

SQL Bits 6 – Registration Open

For anyone interested in SQL Server, the unmissable SQL Bits conference is returning for round six. “SQL Bits – The 6th Sets” is being held in central London for the first time, and unlike previous conferences it will be for one day only on Friday 16th April 2010.

It’s being held as part of the SQL Server 2008 R2 release, also happening that week in London.

Registration is now open, you can find out more at the SQL Bits website.

We highly recommend the event, hope to see you there…

Data Warehouse Presentation

Purple Frog is presenting a session on data warehouse design concepts at the South Wales SQL Server User Group on Thursday 25th February 2010.

If you’re in the area and want to come along you can register for free here. Eversheds are hosting the event in their Cardiff office.

Adam Morton will be demonstrating an ETL control framework in SSIS, and Alex Whittles will be discussing the concepts of data warehousing and the fundamental differences in architecture between OLTP and OLAP systems.

Hope to see you there!

Power BI Sentinel
The Frog Blog

Team Purple Frog specialise in designing and implementing Microsoft Data Analytics solutions, including Data Warehouses, Cubes, SQL Server, SSIS, ADF, SSAS, Power BI, MDX, DAX, Machine Learning and more.

This is a collection of thoughts, ramblings and ideas that we think would be useful to share.

Authors:

Alex Whittles
(MVP)
Jeet Kainth
Jon Fletcher
Nick Edwards
Joe Billingham
Lewis Prince
Reiss McSporran
Microsoft Gold Partner

Data Platform MVP

Power BI Sentinel
Frog Blog Out
twitter
rssicon