0845 643 64 63

Uncategorized

Automate changing SSIS connection at runtime

Recently a customer came to us with an issue: “We have a primary/secondary AlwaysOn failover cluster, we want the data warehouse ETL to always pull from the secondary read-only replica. When the primary fails over, how can we detect this and repoint the ETL to the new secondary?”

This led us to the following qualification steps:
1. Is the secondary server accessible?
a. Yes – Use it.
b. No – Try the primary server.
2. Is the primary server accessible?
a. Yes – Use it.
b. No – Fail SQL Agent Job.

This sounds simple enough, but how do we handle this in a SQL Agent Job? Using the built-in SSISDB Store Procedures! (Links at the end for further reading on the Stored Procedures used).

Step 0: Ensure that your Connection has a parameterised option for the ServerName

Step 1: Set the SQL connection manager to the SECONDARY server. (This is done to always try this server first).

DECLARE @var sql_variant = N’1.1.1.1′ –Replace 1.1.1.1 with your SECONDARY destination IP/Servername
EXEC [SSISDB].[catalog].[set_object_parameter_value]
@object_type=20,
@parameter_name=N’CM.XXX.ServerName’, –Replace XXX with the name of your connection
@object_name=N’XXX’, –Replace XXX with the name of your SSIS Package or Project for the level of connection
@folder_name=N’XXX’, –Replace XXX with the name of the folder your SSIS Project is deployed into
@project_name=N’XXX’, –Replace XXX with the name of your SSIS Project
@value_type=V,
@parameter_value=@var
GO

Step 2: Check the connection can access the database. (This can be any query with a pass/fail. In this instance, we want to check whether the database is a writeable replica, but this could be a simple query that will fail if not available).

IF EXISTS(SELECT 1 FROM master.sys.databases WHERE [name] = ‘XXX’) –Replace XXX with the database name
BEGIN
IF NOT (DATABASEPROPERTYEX(‘XXX’, ‘Updateability’) <> ‘READ_WRITE’) –Replace XXX with the database name
THROW 51000, ‘This is not a writeable replica’, 1;
END

Step 3: Set the SQL connection manager to the PRIMARY server.

DECLARE @var sql_variant = N’1.1.1.1′ –Replace 1.1.1.1 with your PRIMARY destination IP/Servername
EXEC [SSISDB].[catalog].[set_object_parameter_value]
@object_type=20,
@parameter_name=N’CM.XXX.ServerName’, –Replace XXX with the name of your connection
@object_name=N’XXX’, –Replace XXX with the name of your SSIS Package or Project for the level of connection
@folder_name=N’XXX’, –Replace XXX with the name of the folder your SSIS Project is deployed into
@project_name=N’XXX’, –Replace XXX with the name of your SSIS Project
@value_type=V,
@parameter_value=@var
GO

Step 4: Check the connection can access the database. (Again this primary server should now be a writeable replica in our case).

IF EXISTS(SELECT 1 FROM master.sys.databases WHERE [name] = ‘XXX’) –Replace XXX with the database name
BEGIN
IF NOT (DATABASEPROPERTYEX(‘XXX’, ‘Updateability’) <> ‘READ_WRITE’) –Replace XXX with the database name
THROW 51000, ‘This is not a writeable replica’, 1;
END

Step 5: Email to say primary server used for information.

DECLARE @EmailBody NVARCHAR(1024)
SET @EmailBody = CONVERT(varchar(10), GETDATE(), 103) + ‘ – the ETL was pointed to the primary version when the server check was run’

EXEC msdb.dbo.sp_send_dbmail
@profile_name = ‘XXX’, –Replace XXX with the name of your SQL mail Profile
@recipients = ‘XXX’, –Replace XXX with the list of email addresses you’d like to notify
@body = @EmailBody,
@subject = ‘XXX’ ; –Replace XXX with the desired email title

Step 6: Continue SQL Agent job as normal

The flow for this would then be as follows:

STEP 1
Success – Go to Step 2 (Connection set successfully, so test the connection).
Failure – Fail Job

STEP 2
Success – Go to Step 6 (Successfully used the secondary replica, so continue with the SQL Agent Job).
Failure – Go to Step 3 (Unable to use the replica, so try the primary).

STEP 3
Success – Go to Step 4 (Connection set successfully, so test the connection).
Failure – Fail Job

STEP 4
Success – Go to Step 5 (Successfully used the primary, so send notification email).
Failure – Fail Job

STEP 5
Success – Go to Step 6 (Continue SQL Agent Job).
Failure – Go to Step 6 (Because if the email notification fails, we still want to run the rest of the SQL Agent Job).

Microsoft Docs for: set_object_parameter_value

Microsoft Docs for: sp_send_dbmail

Any questions? Let me know in the comments below or contact me on twitter: @PurpleFrogReiss

Increasing DAX Filter Speed and Efficiency

To understand how a filter can affect performance, first we need to understand what a filter is.

A filter is a temporary table that is created on demand, held in memory, and then used as a reference for the function being filtered. The rows of the temporary filter table are all DISTINCT. This is because for the purposes of filtering, the engine doesn’t care whether there are 1, 2, 3 or 250 occurrences of a value in table, only whether it exists or not.

Let’s take a look at some DAX:

This measure is adding together all the values of Sales Amount, but only where the List Price of the product being sold is greater than 100.

What we are asking the DAX engine to do is:

  1. Create a temporary table of product list prices where list price is > 100.
  2. Iterate through the FactInternetSales table and grab all the SalesAmount values from those records where the related product has a list price that appears in the temporary table.
  3. Add all the SalesAmount values together.

While this measure will work and produce the expected results, it isn’t the most efficient way of doing things.

This FILTER function is populating the temporary table with every column per row of DimProduct that meets the criteria of DimProduct[ListPrice] > 100, that table will look something like this:

For our predicate of DimProduct[ListPrice] > 100 to work, we only need to check one column, List Price, yet we are pulling every column into memory unnecessarily. As we are including every column, including the ProductKey, every row will be distinct regardless of whether a specific list price has already been found on another record.

This means the table will contain more columns and rows than we need. The wider and longer this table, the more memory we are taking up with data we don’t need to perform the filter.

So, is there a better approach?

The use of ALL in the FILTER (Line 5 above) means we can specify the column(s) we want to filter by rather than the entire table, so the temporary table held in memory is now only 1 column wide.

The predicate still has everything it needs to function, a temporary table of distinct list prices with which to cross-reference DimProduct.

Remember, a temporary filter table always contains distinct rows. Now that we only have one column, where there are duplicate list prices we will only have one row for each. A shorter and narrower table will consume a lot less memory than the wider, longer one created by the previous query.

Below is a table showing both measures produce the same results.

A smaller temporary table means less memory usage and less to scan through, which in turn equates to more speed as seen here in the Performance Analyzer:

What if there are global filters in place?

The use of ALL means that we are removing any global filters from the measure’s filter context, so any page filters or slicers will be ignored. If you want these to remain in effect, simply wrap the filter in a KEEPFILTERS function, this will allow the global filters to remain but still allow you to only pull one column into memory.

While this may seem trivial at first glance, the performance analyser shows the speed increase between the two queries to be more than double. If your dataset contains very wide tables with millions of rows and your report pages contain a lot of visuals with lots of measures (CALCULATE being one of the most popularly used functions), this speed increase will scale to be meaningful and noticeable by your end users.

How to create the Fibonacci Sequence in Python

This blog post will go through how to create the famous Fibonacci sequence in Python.

Each number in the Fibonacci sequence is the sum of the previous two terms.

The sequence starts: 0, 1, 1, 2, 3, 5, 8, 13, … and carries on infinitely.

The Python code I used is:

The first two terms are entered into our list, then we specify the number of terms wanted in our sequence.

Next, we loop through from 2 to the number of terms we wanted appending the sum of the previous two terms.

Printing our list gives the first fifty terms of the Fibonacci sequence.

How the UNICHAR() DAX Function Enhances Power BI Reports

The UNICHAR() DAX function is a text function that takes a numerical Unicode value and displays its associated character. For example, UNICHAR(128515) will display as:

90% of the information the human brain processes is visual and we process images up to 60,000 times faster than text, so it makes perfect sense to use icons where possible to enhance reports. This scarcely used DAX function opens-up that option.

The below stacked column chart uses Unicode emoticons to enhance the readability of the ‘Genre’ axis labels.

So, how do we achieve this?

To produce this you will need to edit the query. In the ‘Data’ view, right click the relevant table and select “Edit Query”

First, duplicate the existing column you want Unicode characters for (genre in this case). Then use the ‘Replace Values’ option to substitute in the relevant Unicode numbers for each genre.

(this can be hidden from the report view as it contains nothing meaningful).

Next, create a second calculated column that uses a simple measure:

IconColumn = (UNICHAR(UnicodeNumberColumn))

This new ‘Icon’ column can now be used in reports the same way as any other text column.

Note how in the stacked column chart above, the original names have been included, this is good practice for two main reasons. One is clarity, a clown denotes comedy to most users, but could indicate horror to others, including the label removes the ambiguity.

The other reason is due to possible compatibility issues. It is worth pointing out here that the Unicode characters will only display when the character exists in the chosen font. In most cases this will be fine, especially for emoji characters, but just in case there are display issues it is worth including the full label.

Staying with the movie topic, the below chart shows movie ratings both numerically and visually created by a custom measure:

Stars = REPT(UNICHAR(11088), AVERAGE('IMDB 1000'[10 Star Rating]))

A measure that uses the UNICHAR() function will always be a text field and as such, normal formatting applies, in the example above we can set colours to be gold on a black background.

The previous examples do help readability but don’t really add anything meaningful to the report. The below table shows that the UNICHAR() function can add worthwhile content with customisable KPIs by combining it with conditional formatting.

There are 143,859 Unicode characters available, everything from emojis, symbols, shapes and braille patterns to dice and playing cards. Whether you want to offer further insight into your data, enhance the user experience or simply create something sublimely ridiculous, with so many icons at your fingertips, the possibilities are only limited by your imagination.

Further information on the UNICHAR() function can be found here: UNICHAR function (DAX) – DAX | Microsoft Docs
A list of Unicode characters and their respective numerical values can be found here: Huge List of Unicode Characters

Join 2 Python lists together using nested loops

In this blog post I will show you how to join two 2D Python lists together.

The code is in the screenshot below.

Lines 1 – 2 are two lists that are going to be joined, line 3 is an empty list where the output will be appended to.

Lines 4 – 5 are two loops (one nested inside the other) which cycle through the records in both lists, line 6 checks whether the first items (index 0) in the two records from each list that are currently i & j match. If yes: the Key, Capital and Country are appended to our new list.

Lines 9 – 10 show the output record by record showing the join has worked successfully.

This code can be expanded for 3 lists, the code would have a 3rd for loop and an extra check in the if statement to find the correct record in the 3rd list to join.

Joining lists together in Python is useful when there is data in different lists and it would be beneficial if it were combined.

Dynamic Date Formats in Power BI

Which date format styles should we use if we are building a report that is being consumed internationally?

Remember, 01/12/2021 is December 1st or January 12th depending in which part of the world it is being read.

The decision may be taken from our hands if there is a company policy in place. If the company is based in the USA, for example, they may choose to use US formatted date fields as a standard for reporting across the entire business, however, if the field needs to be truly dynamic depending on the consumers location, the answer lies in this tool tip:

Explanation of dynamic date formats

There are 2 formats in the selection that are prefixed with an asterisk:

Selection of dynamic date formats
* We shall use ‘General Date’ in the examples throughout this post for reasons explained later

There are 2 variables that the Power BI Service checks when loading reports in the service.

First it will check the language setting of the user account in the service. This is set under ‘Settings >> General >> Language’. There is a dropdown option that acts as both a language and regional setting, this drives how dates are formatted when dynamic date formats are used.

Power BI service language settings

If this is set to ‘Default (browser language)’ the second variable, the browser’s default language setting, will take effect.

In Edge this is set under ‘Settings >> Language’, when multiple languages are set, the topmost one is considered the default.

Language settings in Edge

In Chrome it is set under ‘Settings >> Advanced >> Language’, this uses the same system as Edge where the topmost language is used as default.

Language settings in Chrome

Here is an example of a table loaded in a browser using both English UK and English US:

English UK
English US

This example shows that not only does the format of the date itself change (day and month have switched) but there are also visual connotations to account for. The US format uses a 12-hour clock by default and the addition of the AM/PM suffix changes the column width and drastically alters the readability of the table and potentially the entire report. It is these occurrences we need to be aware of when developing reports for international consumption.

This issue can easily be avoided by using the ‘Auto-size column width’ setting under ‘Column Headers’ on the formatting tab of the visual, or by allowing for the growth when setting manual column widths. (For a great guide on manually setting equal column widths, please read this helpful post by my colleague, Nick Edwards)

Unfortunately, this post comes with a caveat, at the time of writing it would seem there is a bug in Power BI. Remember this from earlier?

Explanation of dynamic date formats
Selection of dynamic date formats

As you can see below, both fields use the UK format of DD/MM/YYYY when the browser language is set to English UK.

Settings set to UK
UK dates

However, when the browser settings are changed to English US, only the *‘General Date’ format has changed, the *’DD/MM/YYYY’ format is still showing in the UK format even though there is an asterisk next to it in the selection list.

Settings set to US
Erroneous mix of US and UK dates

Hopefully once this issue is addressed, the use of regionally dynamic date formats will be available for both long and short formats.

Power BI – Enable Load

In Power BI Power Query there is an option to enable or disable whether a table is loaded into the report. The option ‘Enable load’ can be found by right clicking on the table. Typically, by default, the load is already enabled.

There is also an option ‘Include in report refresh’ which lets a user stop a table from refreshing when they refresh the entire report. This maybe useful for static tables or tables that are large which take a long time to refresh and a user wants to concentrate on how other tables are refreshing.

Once a user disables the ‘Enable load’ option, the table name turns italic which is an easy way for users to determine whether a table will be loaded or not.

After applying these changes, no data has been loaded into the report.

To re-enable the load, jump back into Power Query, right click on the table and ‘Enable load’.

Finally, some scenarios where it might be useful to disable loading a table:
– Disable loading tables in Power Query that were only ever stepping stones to create other tables
– See how removing a table effects your report before deleting it
– Removing a table that might be required again in the future

How to make your matrix column widths all equal to each other in Power BI using DAX.

Have you ever come across an issue where your Power BI matrix column widths just aren’t the same width and visually just don’t look right?

Unfortunately (as of April 21’) there is no easy way to make all column widths equal in the format pane of a matrix visual.

However there is a hack to set the width of your all columns in a matrix so that they are all equal and pixel perfect with DAX!

How do you do this I hear you ask?

Firstly create a new measure called ‘Set Column Width’ and enter a string value equal to the length of your longest column title. In my case my longest title is “Front Derailleur Cage” and this has a length of 21 characters including spaces. Therefore I need to set my DAX expression to be a string which is 21 characters long. In my example I’ve just created a string of 21 asterixis wrapped in speech marks – but this can be any combinations of characters you like!

The next job is to go to the format pane of your matrix and set the “Show on rows” toggle equal to ‘On’.

Next make sure the ‘Auto-size column width’ is set equal to ‘On’.

Now drag your newly created DAX expression (in my case ‘Set Column Width’) on to the values field of your matrix.

You’ll then notice that your matrix will look similar to the snip below – a little bit of a mess! But not to worry this all part of the plan!

Next go back the format pane of your matrix and set ‘Auto-size column width’ to ‘Off’.

Now remove your ‘Set Column Width’ measure from the visual by clicking the ‘X’ symbol on the field pane.

Finally increase the width and height of your matrix visual to accommodate the increased column widths.

You now have pixel perfect column widths which are all equal to each other!

A huge thanks to the brilliant MVP Ruth Pozuelo Martinez (@ruthpozuelo) from curbal.com for this hack! It’s a been a massive help for my Power BI reports here at Purple Frog! Hopefully the Power BI team will release a proper solution in the matrix format pane soon!

Tabular Cube Processing Report

I have created a Power BI report which provides detail on the state of processing in a Tabular Cube.
The report uses the cube’s dynamic management views to provide information about the cube’s partitions.

To use the tabular cube processing report, you need to insert the following information:

  • Server
  • Database (Cube Name)

Once entered and assuming the connection is fine you need to accept some native queries. These statements are select statements and will not alter the cube. That should give you a report similar to the one below, I have used an adventure works tabular model as an example.

Click here to download the report

This report is inspired by one I found for general tabular cube documentation by Data Savvy, which I highly recommend. That report includes information on: tables, columns, relationships, measure and security:

Click here to view tabular cube documentation blog post

My Experience of SQLBits 2020

A few weeks ago, Purple Frog attended SQLBits 2020 (the largest data conference in Europe). This year the event had a change of pace as the whole conference was hosted online due to the current global pandemic. We converted the office into our own small venue with almost every screen being used over the week to display sessions. With the help of plenty of social distancing, plenty of pizza and the odd bacon sandwich… it worked quite well!

Here is what the frogs had to say about the event,

SQLBits 2020 had literally hundreds of sessions to choose from. It was a tough call which session to watch next in most cases! Sessions ranged from hardcore data related sessions, to softer creative/design/career sessions too. I enjoyed seeing sessions from big names such as Christian Wade, Chris Webb, Brent Ozar and not forgetting the Italian DAX experts Marco & Alberto! I also gained lots of insight from Paul Andrew’s (a Purple Frog alumni) training day session on Azure Data Factory and gained lots of hints and tips that we could apply to live projects that I’m working on.

As some of the sessions were pre-recorded, the presenter was on hand to answer any questions along the way using the interactive chat window which was super useful and interactive. Finally, the organisers hosted a pub quiz Thursday night which was a great laugh as well as hosting lunch time entertainment on Thursday & Friday from the guys from the ‘Festival of the Spoken Nerd‘! All in all a fantastic week of remote training and insight! Be sure to get yourself booked on for next year! – Nick Edwards (BI Developer)


I had the pleasure of attending SQLBits 2019 last year, to see such a large event be totally migrated online was rather impressive. While the lunchtime entertainment and the pub quiz were a lot of fun with Steve Mould having a comedy routine on Venn diagrams, my highlight was attending Andy Leonard’s “From Zero to Azure Data Factory” training day. Andy is a great presenter and very knowledgeable about this area, there were many versatile examples he provided during the day that I found very useful. While I would have preferred to attend a live event, given the circumstances I feel SQLBits 2020 was a success and has really set the mark for what a virtual conference can achieve.- Liam McGrath (Support Engineer)


SQLBits was very different this year! The atmosphere of a virtual room, compared to an in person event is never going to compete, but that doesn’t mean that the quality or content was any lesser for it! In fact the ability to ask questions directly to the speakers via the chat windows or Q&A during the pre-recorded presentation, allowed them to fully answer your question in depth and have a good conversation around it. Sitting in on sessions with Industry giants, Itzik Ben-Gan, Brent Ozar, Pinal Dave and Simon Whitely and Terry McCann from Advancing Analytics was incredibly valuable as expected.

There were loads of incredibly thought provoking sessions, ranging from Machine Learning in Power BI, to Databricks vs Synapse. Unlocking your LinkedIn profile, to T-SQL Performance Tuning. If you were any way interested in Data, there was something for you to learn! Overall I had a great time, and I can’t wait to see what the wonderful team of volunteers and speakers come up with next year! Bring on SQLBits 2021! – Reiss McSporran (BI Developer)


I found the conference really useful and quite a fun way of learning. I got some really useful knowledge out of the sessions and got to see different ways of doing things. The range of sessions is really good and there are so many options, you get a good mix of topics and therefore aren’t overwhelmed with information. The lunchtime entertainment and quiz were a really fun addition, so it’s not just all about the learning. – Jeet Kainth (BI Developer)


SQLBits 2020 was the first SQLBits event that I have attended. I enjoyed lots of different sessions on numerous topics such as: DAX, Power BI, Azure Machine Learning, SQL Server and even one about brand building.
The sessions were very informative and I learnt a lot. I would definitely attend again even if the event was to be virtual. – Jon Fletcher (BI Developer)


Having attended many SQLbits conferences over the years, I was excited to be accepted to deliver a talk this year on LinkedIn. As the conference was virtual, we have to pre-record our sessions but this meant that whilst the recordings were playing we were able to directly interact with the delegates in the chat to answer questions. It was great to see a soft skills topic as part of the SQLBits agenda and given the positive response I hope that we start to see more of these sessions scheduled at future events. Also, it was funny that I had slightly more people in my session than Alex Whittles! – Hollie Whittles (Speaker)

Power BI Sentinel
The Frog Blog

Team Purple Frog specialise in designing and implementing Microsoft Data Analytics solutions, including Data Warehouses, Cubes, SQL Server, SSIS, ADF, SSAS, Power BI, MDX, DAX, Machine Learning and more.

This is a collection of thoughts, ramblings and ideas that we think would be useful to share.

Authors:

Alex Whittles
(MVP)
Reiss McSporran
Jeet Kainth
Jon Fletcher
Nick Edwards
Joe Billingham
Microsoft Gold Partner

Data Platform MVP

Power BI Sentinel
Frog Blog Out
twitter
rssicon