0845 643 64 63

SQL Server

1 2 3 6

The SQL Query Alias Conundrum – Order of Execution

Falling Dominoes

So, you have just written a query, hit execute and you have encountered an error: Invalid column name ‘[column name]‘.

The column you’ve used in your WHERE clause cannot be identified by its alias. You’ve defined it at the top of the query and used it fine previously as your ORDER BY condition, so why can’t the engine recognise it?

If you’ve ever written a SQL query, this should look familiar:

  1. SELECT
  2. DISTINCT
  3. FROM
  4. JOIN
  5. ON
  6. WHERE
  7. GROUP BY
  8. HAVING
  9. ORDER BY

It makes sense that we read the query from top to bottom / left to right (especially for those in the West), but contrary to how it may look, this isn’t how the query is executed. The order of execution is:

  1. FROM / JOIN
  2. WHERE / ON
  3. GROUP BY
  4. HAVING
  5. SELECT
  6. DISTINCT
  7. ORDER BY

When you stop and think about what is happening, logically, this order makes sense.

FROM / JOIN – First and foremost, we need to determine the location of the data set that we are querying.

WHERE / ON – We need to narrow that data set down with predicates to keep only the records we care about.

GROUP BY – If you need to aggregate the results, this is done next to give us an even more concise result set.

HAVING – WHERE 2.0! This is a predicate over the aggregated result set.

SELECT – Okay, now we have our final result set, containing ONLY our required rows. Now, finally, we can select which columns we want from that dataset.

DISTINCT – Now we know which columns we want from the result set, we can strip out any duplicates.

ORDER BY – Finally, we have whittled the result set down to only the rows and columns we need, so we can order it however we want.

Question Mark

What is the value in knowing this?

It’s important to understand the difference between the logical and lexical order of a query. Knowing this will help you troubleshoot any syntax errors you may encounter.

Going back to our original issue of using an alias in a WHERE clause, you can now see why you get an error about an invalid column.

Look back at the order of execution, aliases are assigned to columns during the SELECT stage, WHERE comes before SELECT. The engine is trying to process the column by its alias before it’s been assigned. ORDER BY comes after SELECT so the same issue doesn’t arise here

It also explains why Intellisense doesn’t autocomplete or suggest column names until you’ve added the FROM statement. Until you tell the engine where the data is coming FROM, it doesn’t know which columns it can SELECT.

Combining Queries from Multiple Sources in Power BI using Merge and Append

It is always good practice to do as much data preparation as close to the sources as you can before importing or connecting them to your Power BI reports, but what if there are circumstances where this isn’t possible?

I had an issue recently where a third-party application had been updated and both the new and legacy versions were being used side-by-side. Logging data from both versions was being written to two separate Azure SQL databases.

The customer needed a Power BI report showing both old and new logging data sets as a single source. If both databases were SQL Server databases, I could have written a view with a cross-database join, imported that into Power BI and thought no more about it. However, the two sources being Azure SQL Databases, with no easy way to join the tables, caused an issue.

This is where the Merge and Append functions in Power Query come in.

The first step to solve my issue was to create a view in each of the Azure databases that were identical in structure (you’ll see why later on) and import these into Power BI.

Now these data sources (referred to as ‘Queries’ by Power BI) have both been imported into the data model, we have two options with regards to how to combine them, ‘Append’ and ‘Merge’.

Merge
Although I didn’t use the ‘Merge’ function, I have included some information about it here as it is still relevant. ‘Merge’ is useful when you have columns from one source that you would like to add to another, the simplest way to think about it is that it works in the same way a JOIN works in SQL, in fact, when you enter the ‘Merge’ wizard there is a ‘Join Kind’ option:

This is how the ‘Merge’ function works:

Append
To solve my issue (explained above), I used the ‘Append’ function. While still combining 2 sources it concatenates one query with another, it’s SQL equivalent would be a UNION ALL. ‘Append’ gives you the option to combine as many tables as you wish, regardless of structure. If a column exists in one query but not another, the column will be filled with NULLS where applicable, this is why it was important to create the two identical views at the very beginning.

This is how the ‘Append’ function works:

When I appended the two sources, I chose the option to create ‘as New’ so I can hide the original 2 queries and I have a nice new table (as seen below) which I can rename, tidy up and make a bit more user friendly, ready for the report consumer to use.

As previously mentioned, data preparation should be done as close to the source as possible, but in situations where this is difficult or simply isn’t possible, it’s important to know about these Power BI functions and how you can use them to your advantage.


More information can be found here:
Append queries – Power Query | Microsoft Docs
Merge queries overview – Power Query | Microsoft Docs

Execute SQL Task in ADF

If you’re new to ADF or Synapse pipelines and looking for a way to execute SQL commands in pipelines, unfortunately there is no equivalent to the Execute SQL Task in SSIS, but I have found some alternatives which work well for me.

For something as simple as a SELECT statement where you want to return results to be used later, you can use a Lookup activity:

You can then reference the output of this activity in other activities using dynamic content such as @activity(‘LookUp’).output.firstRow.DateKey.

Although you can also use the Lookup activity for DML commands, it is not designed for this and so I’ve found another way to do this. First, create the following stored procedure in the database:

This stored procedure will execute the SQL command passed in via the @sql parameter. Now you can use the Stored procedure activity to point this newly created stored procedure and pass in a SQL command as shown here:

This activity will now execute the SQL command stated in the parameter value. Using this approach helps keep things consistent across your pipelines for SQL commands as they all use the Stored procedure activity, and the only difference is the SQL command used in the parameters section.

You can also add dynamic content to dynamically build up the SQL command that is passed into the stored procedure, this can allow you to use the same configuration for multiple Stored procedure activities but with different outcomes. For example if you had pipeline variables, the above TRUNCATE TABLE command could be replaced with @concat(‘TRUNCATE TABLE ‘,variables(‘SchemaName’),’.’,variables(‘TableName’)) and this would truncate a different table per pipeline based on the variable values.

Note: there is a limit on the number of characters the parameter value can take, so if you have a large SQL command it is best to create a stored procedure specifically for that command and to reference that stored procedure in the activity.

Pretty simple in the end, I hope this helps!

Automate changing SSIS connection at runtime

Recently a customer came to us with an issue: “We have a primary/secondary AlwaysOn failover cluster, we want the data warehouse ETL to always pull from the secondary read-only replica. When the primary fails over, how can we detect this and repoint the ETL to the new secondary?”

This led us to the following qualification steps:
1. Is the secondary server accessible?
a. Yes – Use it.
b. No – Try the primary server.
2. Is the primary server accessible?
a. Yes – Use it.
b. No – Fail SQL Agent Job.

This sounds simple enough, but how do we handle this in a SQL Agent Job? Using the built-in SSISDB Store Procedures! (Links at the end for further reading on the Stored Procedures used).

Step 0: Ensure that your Connection has a parameterised option for the ServerName

Step 1: Set the SQL connection manager to the SECONDARY server. (This is done to always try this server first).

DECLARE @var sql_variant = N’1.1.1.1′ –Replace 1.1.1.1 with your SECONDARY destination IP/Servername
EXEC [SSISDB].[catalog].[set_object_parameter_value]
@object_type=20,
@parameter_name=N’CM.XXX.ServerName’, –Replace XXX with the name of your connection
@object_name=N’XXX’, –Replace XXX with the name of your SSIS Package or Project for the level of connection
@folder_name=N’XXX’, –Replace XXX with the name of the folder your SSIS Project is deployed into
@project_name=N’XXX’, –Replace XXX with the name of your SSIS Project
@value_type=V,
@parameter_value=@var
GO

Step 2: Check the connection can access the database. (This can be any query with a pass/fail. In this instance, we want to check whether the database is a writeable replica, but this could be a simple query that will fail if not available).

IF EXISTS(SELECT 1 FROM master.sys.databases WHERE [name] = ‘XXX’) –Replace XXX with the database name
BEGIN
IF NOT (DATABASEPROPERTYEX(‘XXX’, ‘Updateability’) <> ‘READ_WRITE’) –Replace XXX with the database name
THROW 51000, ‘This is not a writeable replica’, 1;
END

Step 3: Set the SQL connection manager to the PRIMARY server.

DECLARE @var sql_variant = N’1.1.1.1′ –Replace 1.1.1.1 with your PRIMARY destination IP/Servername
EXEC [SSISDB].[catalog].[set_object_parameter_value]
@object_type=20,
@parameter_name=N’CM.XXX.ServerName’, –Replace XXX with the name of your connection
@object_name=N’XXX’, –Replace XXX with the name of your SSIS Package or Project for the level of connection
@folder_name=N’XXX’, –Replace XXX with the name of the folder your SSIS Project is deployed into
@project_name=N’XXX’, –Replace XXX with the name of your SSIS Project
@value_type=V,
@parameter_value=@var
GO

Step 4: Check the connection can access the database. (Again this primary server should now be a writeable replica in our case).

IF EXISTS(SELECT 1 FROM master.sys.databases WHERE [name] = ‘XXX’) –Replace XXX with the database name
BEGIN
IF NOT (DATABASEPROPERTYEX(‘XXX’, ‘Updateability’) <> ‘READ_WRITE’) –Replace XXX with the database name
THROW 51000, ‘This is not a writeable replica’, 1;
END

Step 5: Email to say primary server used for information.

DECLARE @EmailBody NVARCHAR(1024)
SET @EmailBody = CONVERT(varchar(10), GETDATE(), 103) + ‘ – the ETL was pointed to the primary version when the server check was run’

EXEC msdb.dbo.sp_send_dbmail
@profile_name = ‘XXX’, –Replace XXX with the name of your SQL mail Profile
@recipients = ‘XXX’, –Replace XXX with the list of email addresses you’d like to notify
@body = @EmailBody,
@subject = ‘XXX’ ; –Replace XXX with the desired email title

Step 6: Continue SQL Agent job as normal

The flow for this would then be as follows:

STEP 1
Success – Go to Step 2 (Connection set successfully, so test the connection).
Failure – Fail Job

STEP 2
Success – Go to Step 6 (Successfully used the secondary replica, so continue with the SQL Agent Job).
Failure – Go to Step 3 (Unable to use the replica, so try the primary).

STEP 3
Success – Go to Step 4 (Connection set successfully, so test the connection).
Failure – Fail Job

STEP 4
Success – Go to Step 5 (Successfully used the primary, so send notification email).
Failure – Fail Job

STEP 5
Success – Go to Step 6 (Continue SQL Agent Job).
Failure – Go to Step 6 (Because if the email notification fails, we still want to run the rest of the SQL Agent Job).

Microsoft Docs for: set_object_parameter_value

Microsoft Docs for: sp_send_dbmail

Any questions? Let me know in the comments below or contact me on twitter: @PurpleFrogReiss

Azure Synapse Series: Hash Distribution and Shuffle

For this post I’m going to presume you’ve already taken a look at distributing your data using a hash column, and you’re not experiencing the performance you’re expecting. (If you’re not already aware of what this is, take a look at the following link to learn the basics of what a distributed table is and why you need it in Azure Synapse. I’ll be here when you get back.)

https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-distribute

So now that we’re on the same page, one of the common issues encountered is trying to understand what Hash Distribution actually means and how it functions? In basic terms the column you choose to distribute by gets converted into a hash using a deterministic hash function, which creates the same value for any identical values passed. This places different rows of data on the same compute node, where the column(s) you have used to hash by match. This increases query performance and stops data being passed between Compute Nodes, massively increasing the performance of queries.

Great! So using the knowledge above I’ve created two tables, using a hash distribution and the data that I’m hashing by is the same value in each table. Let’s run an estimated execution plan to see how the joins work!

Shuffle 100%?

Oh dear. Shuffle. On the surface, shuffle is not necessarily a bad thing and may even be expected, but not in this scenario. So what is shuffle and why can it be bad?

Shuffle occurs when a part of a distributed table is moved to a different node during query execution. To do this a hash value is computed using the join columns, the node is then found that has that hash value and the row is then sent to that node for processing. If the tables you’re joining on have different hash distributions, you’d expect this to happen, (hence why it’s not always a bad thing!), but in this case I’ve used the same value to hash both of my tables, so what’s gone wrong?

Two words: Data Types

In my query below I have created two variables of the same data type (varchar(50)), given them the same text value, (Test Hash Text), and then run the hashing function of SHA2_256 over them to see the results. Running this query against your database will provide you with the same result every time when using the same hashing function.

declare @varchar1 varchar(50), @varchar2 varchar(50)
set @varchar1 = ‘Test Hash Text’
set @varchar2 = ‘Test Hash Text’

select @varchar1 as [Value], hashbytes(‘SHA2_256’,@varchar1) as [Hash]
union all
select @varchar2 as [Value], hashbytes(‘SHA2_256’,@varchar2) as [Hash]

As you can see here, the resulting Hash is the same for both text values.

Matching Hash = Good

But what if I change the data type? Here I’ve added a third variable with a nvarchar(50) data type instead, but with the same text value.

declare @varchar1 varchar(50), @varchar2 varchar(50), @nvarchar nvarchar(50)
set @varchar1 = ‘Test Hash Text’
set @varchar2 = ‘Test Hash Text’
set @nvarchar = ‘Test Hash Text’

select @varchar1 as [Value], hashbytes(‘SHA2_256’,@varchar1) as [Hash]
union all
select @varchar2 as [Value], hashbytes(‘SHA2_256’,@varchar2) as [Hash]
union all
select @nvarchar as [Value], hashbytes(‘SHA2_256’,@nvarchar) as [Hash]

As you can see here, the resulting Hash is different for the different data type even though the value is the same.

Different Hash = Bad

So what does this mean for our Synapse Distribution? Even if you’re distributing your tables on the same values, if you have a data type mismatch, you’re going to end up with a different Hash and an almost guaranteed different Node for your data. (Due to randomness, you may end up on the same Node by chance as there are a limited number of 60 Nodes, but not for the majority of your data).

To fix this, create a new computed column in your table in Synapse that has the same data type that you want to use across all tables using this same column, and Hash Distribute by that new column. The easiest way to do this is using the Create Table as Select (CTAS) command to create the new table with all of the data and a new data type. Add the new column in the select statement with a CAST/CONVERT to correct data type and use that for Hash Distribution. Additional reading on CTAS available here:

https://docs.microsoft.com/en-us/sql/t-sql/statements/create-table-as-select-azure-sql-data-warehouse

So I’ve now put this in place and what does my execution plan look like?

No Shuffle!

That’s much better! No shuffle where there shouldn’t be one and query times against this large amount of data take <5% of the time they took before.

Any questions? Let me know in the comments below or contact me on twitter: http://www.twitter.com/PurpleFrogReiss

ADF Switch Activity – A neat solution for nested IFs.

Whilst carrying out some work for a client using Azure Data Factory I was presented with the challenge of triggering different activities depending on the result of a stored procedure.

In fact the challenge posed was to…

  • Execute ‘Copy A’ activity if the result of a stored procedure returned (A),
  • Execute ‘Copy B’ activity if the result of a stored procedure returned (B),
  • Execute ‘Copy C’ activity if the result of a stored procedure returned (C)
  • Finally execute an error stored procedure if the result of the stored procedure returned (D).

Prior to the switch statement I could achieve this using 4 ‘IF’ activities connected to a lookup activity as shown in the snip below using my ‘Wait’ example pipeline.

However a neater solution is to use the ‘Switch’ activity to do this work instead. I’ll now jump straight into a worked example to show you how I achieved this.

I created a demo.RandomNumberGenerator stored procedure in our Purple Frog Demo database which declares a variable and sets it equal to a random number between 1 and 4. The stored procedure then uses this variable in a case statement to return a string. So if the @randomnumber variable returns 1 the case statement returns ‘Wait 1 Minute’.

I then used a lookup activity in ADF to return the result of this stored procedure which I then used to ‘feed’ my ‘Switch’ activity.

When I preview this lookup activity it just returns the result of the stored procedure. When this is run time and time again it just returns a random ‘ADFAction’ as named in my case statement generated by my rand() SQL function. So in the scenario above it returned an action of ‘Wait 1 Minute’. If I were to preview the stored procedure again it might return ‘Wait 4 Minutes’ instead.

I can then connect my switch activity to this lookup activity using the ‘success’ green output connector.

I now need to add some dynamic content to refer to the output of my lookup activity. I don’t need to create dynamic nested IF content to say if my stored procedure equals ‘Wait 1 Minute’ then do this, if my stored procedure equals ‘Wait 2 Minutes’ then do this… all I need to do is refer to the output of the connected lookup activity ‘Random Number Lookup Activity’.

So in my example this is simply just @activity(‘Random Number Lookup Activity’).output.firstrow.adfaction. ADF will then match the output of the stored procedure to the case statements I provide it. The default activity is just a set of activities that are executed when the expression evaluation isn’t satisfied. I.e. the output of my the stored procedure doesn’t match any of the case statements I provide it.

In the snip below I have provided 4 case statements which match all of the possible outputs from my stored procedure. Essentially there isn’t any need for default activity as my stored procedure will always return 1 of 4 results but it’s useful to know for future reference.

Within each case statement there is an activity for which I would like ADF to perform if it matches the result of my stored procedure. In my example these are just ‘Wait’ activities. So if the stored procedure returns ‘Wait 1 Minute’ I’ve put a ‘Wait’ activity within the case statement to wait 1 minute (or 60 seconds).

When I first ran this pipeline in debug mode I could see that the output of the stored procedure was ‘Wait 2 Minutes’. The switch activity then matched this to the case statements I provided and performed the activity within the ‘Wait 2 Minutes’ case statement and triggered my ‘Wait 2 Minutes’ activity.

When I debugged another run my stored procedure returned ‘Wait 4 Minutes’. The switch activity then executed my ‘Wait 4 Minutes’ activity. Perfect!

So it’s pretty much as simple that!

I hope this helps all you ADF’ers out there!

My experience in obtaining an MCSE

Background

I have recently completed my MCSE in Data Management and Analytics, and I wanted to share my experience of working towards and passing the exams that have led to me getting this MCSE. In this post I will cover some of the challenges I faced and provide some tips and advice, hopefully others following the same certification path, or other paths, will find this useful.

I am a business intelligence developer at Purple Frog, I have several years of experience working with data, I started off as a data analyst and then went into reporting and more recently have been working on ETLs, data warehousing and cubes. I have been working with SQL throughout my various roles and therefore the best place for me to start my certification path was with the exam 70-761: Querying Data with Transact-SQL. The path I’ve taken up to this point is:

MCSA: SQL 2016 Database Development
– 70-761: Querying Data with Transact-SQL
– 70-762: Developing SQL Databases

MCSE: Data Management & Analytics
– 70-767: Implementing a SQL Data Warehouse (from the Elective Exam Pool)

The learning material

Although the first exam (70-761) fitted in quite well with my SQL background (I probably knew about 75% of the material beforehand), there was still some work required for me to get to the stage where I felt I was confident in passing the exam. For me, the best resource and my primary resource for learning the material has been the Exam Ref books, so for example “Exam Ref 70-761 Querying Data with Transact-SQL”. These books are structured in a way that the content is split into the sections covered by the exam, for example the books contain a chapter for every skill covered in the exam.

The one downside to the Exam Ref books is that at times it can feel quite wordy if you’re relying on the book alone, so what I found really useful was to supplement this with videos and demos on the topics where I needed a greater understanding. In addition to this, practice and doing exercises helped me to further understand the different concepts as I was able to try what I had learnt and see where I was going wrong.

The final resource that I found useful was Microsoft Docs (https://docs.microsoft.com/en-us/), this is a really good reference point for key facts, for example I found the page on CDC really useful for my latest exam (70-767).

The exam

There are the obvious tips such as sleep early the night before, get to the exam centre with time to spare and so on, but I wanted to share some of the exam techniques I found useful while doing these exams.

My top tip is check that you have covered off and are comfortable with all the skills measured in the exam, the skills measured for each exam can be found in the “Browse Certifications and Exams” section on the Microsoft Learning website (example shown below for exam 70-761). The skills are also stated in the Exam Ref books and as mentioned before the chapters in the book are based on the skills measured in the exam.

This was taken from https://docs.microsoft.com/en-us/learn/certifications/exams/70-761

What’s useful about the skills measured shown above is that it shows the weight of questions per skill in the exam. This is useful because you can work out if you need to focus on a weaker area if that area is a big part of the exam.

Time shouldn’t be an issue in the exam if you’ve prepared well, however some questions are not worded in the best way and can catch you out so do take the time to read each question properly, and do keep an eye on the time remaining after every 5-10 questions.

You have the option to flag questions and review them again later (note some questions cannot be flagged), make use of these flags for questions you are unsure of. This can be particularly useful if you’ve flagged a question and then a later question gives you a clue or reminds you of the answer for the question flagged earlier. Alternatively, you should be provided with a pen and wipeable board where you can make notes so note down the question number and topic so that you can come back to it later.

Next steps

I am currently studying towards the exam 70-768: Developing SQL Data Models, this will help develop my understanding and knowledge of data modelling and working with cubes and will also help me get the certification for MCSA: SQL 2016 BI Development. With these current certifications being retired in the near future the next plan is to work towards the certification Microsoft Certified: Azure Data Engineer Associate.

I hope you have found this blog useful and that you can use some of the tips mentioned in your study plans, all the best!

SQL LAST_VALUE() wrong results

You may have come across the following scenario when using the LAST_VALUE() function.

You want to find the first or last value loaded in a result set, therefore you use the FIRST_VALUE() and LAST_VALUE() functions like below:

You expect to get “A” for every record in the FirstValue column and “E” for every record in the LastValue column.
However you see the following output:

LAST_VALUE() on its own is implemented from the current row going back to the first row, for example:

  • Row 1 – The only value in scope is A.
    Hence, FIRST_VALUE() & LAST_VALUE() both return A.
  • Row 2 – The values now in scope are A & B.
    Hence, FIRST_VALUE() returns A and LAST_VALUE() returns B.
  • Row 3 – The values now in scope are A, B, C.
    Hence, FIRST_VALUE() returns A and LAST_VALUE() returns C.
  • And so on…

To return the actual last value, add the additional clause ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING. This ensures that for each row the LAST_VALUE() function looks at all rows; from the very first row to the very last row. Now, you have the following code (the original code with an extra column using the additional clause):

This gives you the following output, LastValue2 shows the true last value for the result set:

SQL Server 2016 Licensing Changes, SP1, v.Next CTP1, etc.

connect2016By now you’ve probably heard the exciting news announced at today’s Microsoft Connect conference; SQL Server licensing is changing, significantly, from SQL 2016 SP1.

The functionality of SQL Server Standard, Web & Express editions are being brought up to [almost, 99%] match that of Enterprise Edition. This has a number of critical impacts:

  1. Developers can write applications using a single set of SQL Server functionality, and can now make use of previously Enterprise only features without imposing the cost on their customers.
  2. The customer now decides whether to use Express, Web, Std or Ent based on scaleability, not functionality.
  3. It follows the pricing model already used by most cloud services; pay for usage rather than features.
  4. This brings security to the masses – possibly one of the biggest impacts is that now we can use Row Level Security, Dynamic Data Masking and Always Encrypted in ALL editions of the engine – huge news.

sqlserverThere are some limitations to this though:

  1. It currently only applies to the SQL engine, not to SSIS, SSAS or SSRS – Please Microsoft, follow suit with the BI tools asap!!!
  2. Some features are not [yet] available in the Express and/or Local DB editions, including Change Data Capture, In Memory OLTP and Polybase to name a few.
  3. The trigger to move to Enterprise will now be [almost] purely driven by size and performance requirements.

As if that isn’t enough, other huge announcements include:

  • SQL Server 2016 SP1 is now available
  • New functionality in 2016 SP1, including
    • CREATE OR ALTER functionality for Stored Procs, Views, Functions, etc.
    • DBCC CLONEDATABASE – To easily clone a database schema without the data (love this!!)
    • NUMA support for Analysis Services Tabular – I’ll be checking this out in more detail soon and will blog my findings
  • SQL Server v.Next CTP1 (Community Technology Preview) is now available, which includes SQL Server running on Linux! Note that only the engine is available on Linux at this time, not the BI tools but watch this space, the plan is to offer matching functionality on both

 

For full details, check out the Microsoft blog post here

/Frog-Blog-Out

Excel doesn’t open CSV files correctly from sp_send_dbmail

A nice little nugget today for anyone who uses SQL dbmail to send data in csv attachments.

Sometimes, depending on your language settings, CSV files don’t open correctly in Excel. Excel ignores the ‘,’ and puts all data in column A. You then have to go through the hassle of closing the file and opening it throug the flat file wizard.

 

excelcsv1

There’s a very nice solution to this that I found in this thread. It’s to insert a new row in the top of the file containing “sep=,”. This forces Excel to understand that it is a comma delimited file, and ensures that it will open correctly.

The only problem, how do you insert a header line when using dbmail?…

[Caution – ugly hack alert…]

You can alter the name of the first column to include this header text. We simply rename “Column1” to “sep=,{CR}{LF}Column1”. Then when dbmail prints out the column headers in the file, the Column1 name will be split on two lines, preceeded by “sep=,”.

excelcsv2

Excel treats this first row as an instruction and does not display it, just uses it to make sure it formats the data correctly.

So how do we do this in SQL?

DECLARE @Column1Name VARCHAR(255)
DECLARE @Query VARCHAR(2048)
SET @Column1Name = '[sep=,' + CHAR(13) + CHAR(10) + 'Column1]'
SET @Query = 'SELECT Column1 AS ' + @Column1Name + ', Column2, Column3 FROM myTable'

EXEC msdb.dbo.sp_send_dbmail
@profile_name=XXX
,[Other parameters as required]
,@query=@Query
,@attach_query_result_as_file=1
,@query_attachment_filename='QueryResults.csv'
,@query_result_separator=',' --enforce csv
,@query_result_no_padding=1 --trim
,@query_result_width=32767  --stop wordwrap

And there you go. It’s a bit of a hack, but works well.

Frog-Blog-Out

1 2 3 6
Power BI Sentinel
The Frog Blog

Team Purple Frog specialise in designing and implementing Microsoft Data Analytics solutions, including Data Warehouses, Cubes, SQL Server, SSIS, ADF, SSAS, Power BI, MDX, DAX, Machine Learning and more.

This is a collection of thoughts, ramblings and ideas that we think would be useful to share.

Authors:

Alex Whittles
(MVP)
Jeet Kainth
Jon Fletcher
Nick Edwards
Joe Billingham
Lewis Prince
Reiss McSporran
Microsoft Gold Partner

Data Platform MVP

Power BI Sentinel
Frog Blog Out
twitter
rssicon