0845 643 64 63

Business Intelligence

Azure Synapse Series: What is Workload Management (part2)

So in part 1 we found out what Workload Management was and why we wanted to use it. (If you haven’t read that part, please click here and read that post before you carry on, as we’re jumping in at the deep end!) In this part we’ll be learning how to configure it.

Continuing from the end of the previous post, you have two main options to configure this. First is to assign a user or role to one of the existing system defined workload groups, second is to create a custom workload group and assign a user or role to this instead.

First let’s look at adding a user to an existing workload group. Choose the Workload Group you’d like to add to and click the number in the Classifiers column.

Click the Blue “1”
Then “Add classifier”
This is the complicated part!

Name – Give the Classifier a name (This can be anything)

Member – Select the user/role that you’d like this to apply to. (Run, SELECT USER_NAME() ,against your SQL Pool to find the actual username authenticated with. If you have sa access, this may be replaced with “sa”).

Request importance – This holds 5 levels, that can be used to prioritise conflicting queries. (Think about how important it is for different roles within your environment to access data quickly and who should go first with conflicts).

Label – When you run a query you can add, OPTION (LABEL = ‘Your_Label_Name’) ,to the end of your queries and it will apply whatever Label Name you put in to your query. If the label matches any of the classifiers then it will use that resource class. (Think about having an ‘ETL’ label or ‘ML’ or something similar).

Start and End Time – The time that the Classifier will be active for. (You may only want to grant increased resource usage during particular times. (Think about an overnight process needing a large amount of resource, compared to the same user/role only needing a reduced amount of resource during the day due to conflicting priorities).

Session Context – When you make a connection to your SQL Pool, rather than having to apply a label to every query, you can instead apply a Session Context that will last for that entire connection. This is a session bound variable rather than the usual batch bound variable. You can create this by running, EXEC sys.sp_set_session_context @key = ‘wlm_context’, @value = ‘Your_Session_Name’ ,at the start of your connection. The session name you enter will then apply to this Classifier.

These can all be combined a variety of ways, but each of them have specific weighting:

Weights

These add together, so if you have a user in two different classifiers, one with a label (64 + 16), and the other with a context and time (64 + 8 +4), then the one with the label would be chosen as it has the higher total. 80 to 76.

Secondly creating your own custom Workload Group allows you to be more precise with the allocation of system resources. To create one of these click on the “New workload group” button. There are three suggestions and a custom option. The three suggestions apply some values as a guide, whilst the Custom option just provides default values.

Suggestions and Custom

After this is created, you add Classifiers exactly the same as above.

This is a very complex thing to get right, so here are my final thoughts:

Think about system usage, volume of users vs performance of queries. You can’t assign every user 100% of resources. You’ll cause waits all over the place. How many users are querying at once? How complex are the queries?

Create Custom Workload Groups and Assign Classifiers. The system-defined groups are a great starting point to get away from smallrc, but not great for flexibility.

Assign roles to workload groups with different priorities. How important is it that the report loads quickly for an end user vs a manager vs the board..?

Assign times for specific workloads, overnight ELT, large ML query etc. 100% is sometimes okay… maybe… Scale Up, use 100% resource for intensive Transformation or ML query, Scale Down for everyday usage?

Test, monitor, amend, monitor, amend, monitor, amend… Constant Development Cycle. Listen to users, find out what is going wrong, monitor the usage stats during heavy load on the server, change the configuration to improve things.

Additional reading can be done on the Microsoft website: Here

Any questions? Let me know in the comments below or contact me on twitter: @PurpleFrogReiss

Azure Synapse Series: What is Workload Management (part1)

So you’ve started using Synapse SQL Pools, you’ve scaled up to improve performance, but your queries aren’t going as fast as you’d hoped. You take a look at resource utilisation and you see that as you scale up, your total resource utilisation per query goes down. But didn’t we scale up to allow our query to use more resource and run quicker? What’s going on here? A little thing called Workload Management.

What is Workload Management? At a very high level, Workload Management allows you to control how your workload uses system resources. The main problem here is defaults!

If you navigate to the Workload Management tab of your SQL Pool and click the “User-defined workload groups : All” button you’ll be able to select the “System-defined workload groups” option.

This will allow you to see the pre-created groups and the relevant stats for what resources they’ll utilise.

Great! So what’s the problem? smallrc is the problem. smallrc is the default Workload Group used for all queries against your SQL Pool. If you look at the chart above, you’ll see that the minimum and maximum resources allocated per query in the Effective Values part of the table show 25.00. This is 25% of the total resources available. This means a single query will never be able to utilise the full performance of the SQL Pool. When we scale, this gets worse. The number in brackets will be the new value. For now we’ll focus on smallrc and xlargerc.

Scaled to DW500 only allows 5% resources for smallrc queries
Scaled to DW3000 only allows 3% resources for smallrc queries.

To test this is doing what we expect, I started with my SQL Pool at DW100, then DW500 then DW3000. I ran a single query without storing query results, under each of the xlargerc and then smallrc Workload Groups. The results below show the total utilised resources for each of these queries. As you scale up you can see the disparity in performance can be quite alarming.

xlargerc then smallrc for each DW100, then DW500, then DW3000.

The performance of these queries returned a resultset over a set period of time, the larger the returned resultset, the better the performance. smallrc increased slightly with each scaling, whereas the xlargerc increased massively. This graph above and the resulting data I received, show that scaling without changing Workload Management settings will not give you the performance you’re hoping for.

So now we’ve proven the premise and shown the results, how do we actually get this to work? You have two main options, first is to assign a user or role to one of the existing system defined workload groups, (which is what I did for the tests above), second is to create a custom workload group and assign a user or role to this instead.

Part 2 of this blog post goes into more detail on how to configure workload management and the details of the different options and is available: Here. Additional reading can be done on the Microsoft website: Here

Any questions? Let me know in the comments below or contact me on twitter: @PurpleFrogReiss

Power BI – Enable Load

In Power BI Power Query there is an option to enable or disable whether a table is loaded into the report. The option ‘Enable load’ can be found by right clicking on the table. Typically, by default, the load is already enabled.

There is also an option ‘Include in report refresh’ which lets a user stop a table from refreshing when they refresh the entire report. This maybe useful for static tables or tables that are large which take a long time to refresh and a user wants to concentrate on how other tables are refreshing.

Once a user disables the ‘Enable load’ option, the table name turns italic which is an easy way for users to determine whether a table will be loaded or not.

After applying these changes, no data has been loaded into the report.

To re-enable the load, jump back into Power Query, right click on the table and ‘Enable load’.

Finally, some scenarios where it might be useful to disable loading a table:
– Disable loading tables in Power Query that were only ever stepping stones to create other tables
– See how removing a table effects your report before deleting it
– Removing a table that might be required again in the future

ADF Switch Activity – A neat solution for nested IFs.

Whilst carrying out some work for a client using Azure Data Factory I was presented with the challenge of triggering different activities depending on the result of a stored procedure.

In fact the challenge posed was to…

  • Execute ‘Copy A’ activity if the result of a stored procedure returned (A),
  • Execute ‘Copy B’ activity if the result of a stored procedure returned (B),
  • Execute ‘Copy C’ activity if the result of a stored procedure returned (C)
  • Finally execute an error stored procedure if the result of the stored procedure returned (D).

Prior to the switch statement I could achieve this using 4 ‘IF’ activities connected to a lookup activity as shown in the snip below using my ‘Wait’ example pipeline.

However a neater solution is to use the ‘Switch’ activity to do this work instead. I’ll now jump straight into a worked example to show you how I achieved this.

I created a demo.RandomNumberGenerator stored procedure in our Purple Frog Demo database which declares a variable and sets it equal to a random number between 1 and 4. The stored procedure then uses this variable in a case statement to return a string. So if the @randomnumber variable returns 1 the case statement returns ‘Wait 1 Minute’.

I then used a lookup activity in ADF to return the result of this stored procedure which I then used to ‘feed’ my ‘Switch’ activity.

When I preview this lookup activity it just returns the result of the stored procedure. When this is run time and time again it just returns a random ‘ADFAction’ as named in my case statement generated by my rand() SQL function. So in the scenario above it returned an action of ‘Wait 1 Minute’. If I were to preview the stored procedure again it might return ‘Wait 4 Minutes’ instead.

I can then connect my switch activity to this lookup activity using the ‘success’ green output connector.

I now need to add some dynamic content to refer to the output of my lookup activity. I don’t need to create dynamic nested IF content to say if my stored procedure equals ‘Wait 1 Minute’ then do this, if my stored procedure equals ‘Wait 2 Minutes’ then do this… all I need to do is refer to the output of the connected lookup activity ‘Random Number Lookup Activity’.

So in my example this is simply just @activity(‘Random Number Lookup Activity’).output.firstrow.adfaction. ADF will then match the output of the stored procedure to the case statements I provide it. The default activity is just a set of activities that are executed when the expression evaluation isn’t satisfied. I.e. the output of my the stored procedure doesn’t match any of the case statements I provide it.

In the snip below I have provided 4 case statements which match all of the possible outputs from my stored procedure. Essentially there isn’t any need for default activity as my stored procedure will always return 1 of 4 results but it’s useful to know for future reference.

Within each case statement there is an activity for which I would like ADF to perform if it matches the result of my stored procedure. In my example these are just ‘Wait’ activities. So if the stored procedure returns ‘Wait 1 Minute’ I’ve put a ‘Wait’ activity within the case statement to wait 1 minute (or 60 seconds).

When I first ran this pipeline in debug mode I could see that the output of the stored procedure was ‘Wait 2 Minutes’. The switch activity then matched this to the case statements I provided and performed the activity within the ‘Wait 2 Minutes’ case statement and triggered my ‘Wait 2 Minutes’ activity.

When I debugged another run my stored procedure returned ‘Wait 4 Minutes’. The switch activity then executed my ‘Wait 4 Minutes’ activity. Perfect!

So it’s pretty much as simple that!

I hope this helps all you ADF’ers out there!

Tabular Cube Processing Report

I have created a Power BI report which provides detail on the state of processing in a Tabular Cube.
The report uses the cube’s dynamic management views to provide information about the cube’s partitions.

To use the tabular cube processing report, you need to insert the following information:

  • Server
  • Database (Cube Name)

Once entered and assuming the connection is fine you need to accept some native queries. These statements are select statements and will not alter the cube. That should give you a report similar to the one below, I have used an adventure works tabular model as an example.

Click here to download the report

This report is inspired by one I found for general tabular cube documentation by Data Savvy, which I highly recommend. That report includes information on: tables, columns, relationships, measure and security:

Click here to view tabular cube documentation blog post

How to block the creation of a classic workspace in powerbi.com when a new MS Team is created!

There is now a new tenant setting for Power BI admins to prevent users from creating classic workspaces in Power BI! This includes from places like MS Teams! Hurrah!

If you’re a Power BI Admin you can now enable ‘Block classic workspace creation’.

To do this go to the powerbi.com portal and click the ‘settings’ icon in the top right hand corner of the screen.

Click on ‘Admin portal’ and then locate and click ‘Tenant settings’.

Next under the ‘Workspace settings’ click ‘Block classic workspace creation’.

Now change the toggle from ‘Disabled’ to ‘Enabled’. This will now block the creation of new classic workspaces in your Power BI portal from places like MS Teams.

This will also automatically remove classic workspaces from your Power BI workspace list, but only if they were created by Teams, and only if they’ve never been accessed. If they have been accessed then they’ll be left there for you to deal with manually.

Now you’re probably thinking well I’ll just delete any old remaining workspaces which have been accessed but are no longer being used! Woah hold on…If you do this you’ll end up deleting the underlying office 365 modern group (and the associated Team!) which still may be used.

A better way of doing this is to upgrade the workspace to the new experience and then delete it.

To do this, click on the ellipsis next to the filters button within the workspace you want to delete and click ‘Edit workspace’. Next go to the advanced section and under ‘Upgrade this workspace (preview)’ select ‘Upgrade now’.

A small window will appear in the middle of your screen.

Check the ‘I’m ready to upgrade this workspace’ option and then click ‘Upgrade’.

Now, once the workspace has been upgraded to the new experience you can then click on ‘Settings’ icon and click ‘Delete Workspace’. This will now leave the office 365 modern group untouched but remove it from your workspace. Simple!

As always a big thanks to the guys from at Guy In A Cube (Twitter: https://twitter.com/GuyInACube) on You Tube for sharing this knowledge!

Check out their video here: https://www.youtube.com/watch?v=T2PAL4D2SvU.

Here’s the link to the Microsoft Power BI team’s announcement on this new feature: https://powerbi.microsoft.com/en-us/blog/announcing-you-can-now-block-classic-workspace-creation/

My experience in obtaining an MCSE

Background

I have recently completed my MCSE in Data Management and Analytics, and I wanted to share my experience of working towards and passing the exams that have led to me getting this MCSE. In this post I will cover some of the challenges I faced and provide some tips and advice, hopefully others following the same certification path, or other paths, will find this useful.

I am a business intelligence developer at Purple Frog, I have several years of experience working with data, I started off as a data analyst and then went into reporting and more recently have been working on ETLs, data warehousing and cubes. I have been working with SQL throughout my various roles and therefore the best place for me to start my certification path was with the exam 70-761: Querying Data with Transact-SQL. The path I’ve taken up to this point is:

MCSA: SQL 2016 Database Development
– 70-761: Querying Data with Transact-SQL
– 70-762: Developing SQL Databases

MCSE: Data Management & Analytics
– 70-767: Implementing a SQL Data Warehouse (from the Elective Exam Pool)

The learning material

Although the first exam (70-761) fitted in quite well with my SQL background (I probably knew about 75% of the material beforehand), there was still some work required for me to get to the stage where I felt I was confident in passing the exam. For me, the best resource and my primary resource for learning the material has been the Exam Ref books, so for example “Exam Ref 70-761 Querying Data with Transact-SQL”. These books are structured in a way that the content is split into the sections covered by the exam, for example the books contain a chapter for every skill covered in the exam.

The one downside to the Exam Ref books is that at times it can feel quite wordy if you’re relying on the book alone, so what I found really useful was to supplement this with videos and demos on the topics where I needed a greater understanding. In addition to this, practice and doing exercises helped me to further understand the different concepts as I was able to try what I had learnt and see where I was going wrong.

The final resource that I found useful was Microsoft Docs (https://docs.microsoft.com/en-us/), this is a really good reference point for key facts, for example I found the page on CDC really useful for my latest exam (70-767).

The exam

There are the obvious tips such as sleep early the night before, get to the exam centre with time to spare and so on, but I wanted to share some of the exam techniques I found useful while doing these exams.

My top tip is check that you have covered off and are comfortable with all the skills measured in the exam, the skills measured for each exam can be found in the “Browse Certifications and Exams” section on the Microsoft Learning website (example shown below for exam 70-761). The skills are also stated in the Exam Ref books and as mentioned before the chapters in the book are based on the skills measured in the exam.

This was taken from https://docs.microsoft.com/en-us/learn/certifications/exams/70-761

What’s useful about the skills measured shown above is that it shows the weight of questions per skill in the exam. This is useful because you can work out if you need to focus on a weaker area if that area is a big part of the exam.

Time shouldn’t be an issue in the exam if you’ve prepared well, however some questions are not worded in the best way and can catch you out so do take the time to read each question properly, and do keep an eye on the time remaining after every 5-10 questions.

You have the option to flag questions and review them again later (note some questions cannot be flagged), make use of these flags for questions you are unsure of. This can be particularly useful if you’ve flagged a question and then a later question gives you a clue or reminds you of the answer for the question flagged earlier. Alternatively, you should be provided with a pen and wipeable board where you can make notes so note down the question number and topic so that you can come back to it later.

Next steps

I am currently studying towards the exam 70-768: Developing SQL Data Models, this will help develop my understanding and knowledge of data modelling and working with cubes and will also help me get the certification for MCSA: SQL 2016 BI Development. With these current certifications being retired in the near future the next plan is to work towards the certification Microsoft Certified: Azure Data Engineer Associate.

I hope you have found this blog useful and that you can use some of the tips mentioned in your study plans, all the best!

ADF: Where have my connections gone?

If you’ve been doing any development work in ADF this week you might have noticed that “Connections” has moved. But where has it gone?

The old location

When you click onto “Connections” now you’ll receive the following message:

Management Hub?

Clicking this button takes you to the new Management Hub, the new fourth icon which goes alongside the existing three.

New Manage (Management Hub)

Within this area you now have access to your Linked Services, Integration Runtimes and Triggers. These all have the same options as before, they’ve just moved!

Moved options and New options

You do now have access to more detailed Git configuration options and also the ability to change the Parameterization templates for your ARM templates. This is only accessible when you have a Git repository set up and the Parameter file within it.

You can currently still access Triggers from the main Author window, but I would recommend getting used to finding them in the Management Hub to keep everything together.

Release details of the Management Hub from Microsoft can be found here: https://docs.microsoft.com/en-us/azure/data-factory/author-management-hub

Using ConcatenateX in PowerBI to return multiple values.

In this blog post we’ll take a quick look at using ConcatenateX function to view a concatenated string of dates where the max daily sales occurred for a given month.

I came across this function whilst going through the excellent “Mastering DAX 2nd Edition Video Course” by the guys from SQLBI.com. So credit to Marco and Alberto for sharing this.

So how does it work? If we had a list of dates ranging from 01/01/2020 to 31/12/2020 and we wanted to see which days we achieved maximum sales for each given month in a year we could use the ConcatenateX function to return these dates in a single row per month.

As we can see in the screenshot below, the left hand table shows the month of June where we achieved maximum sales for in June on both 18/06/2020 and 25/06/2020 of 99. In the table to the right we can see those two dates presented on a single row for the month of June in the column “What were the max days?”. This was column was created using the ConcatenateX function!

So let us first look at what the maximum daily sales were per month. To do this we’ll use the MAXX function to create “Max Daily Sales”. This returns the maximum daily sales rate achieved for each given month as a single value. So for the month of June this would be 99. The problem with this is we are not sure which days these max sales were achieved on without drilling down into the data.  Was this just one day or was it multiple days? All we can see is a figure of 99.

So let us create a new measure to work out on which days this figure of 99 occurred on.

The variable at point (a) returns a table with a single column which lists all of the unique dates in our Sales_2020 table.

The variable at point (b) returns the max sales for the given filter context in this case month.

The variable at point (c) uses the filter function to filter out only the days where the max sales were achieved by setting the total quantity sold measure to the max daily sales variable. For example in June we achieved max daily sales of 99 on 18/06/2020 and 25/06/2020. Therefore the variable at point (c) would filter out the ListOfDays table variable to just 18/06/2020 and 25/06/2020 only.

If we just had one max sales day per month we could simply return MaxDaysOnly. However we may have multiple days per month where max sales were achieved. Hence we use the ConcatenateX function to create a string of dates.

The variable at point (d) creates a string of concatenated dates separated by the delimiter “,” which can be used against a single row in a table.

Wrapping it all up returns us this table below, which shows us which days max sales were achieved per given month! Pretty cool eh?

Check out https://www.sqlbi.com/articles/mastering-dax-video-course-2nd-edition/ for more information regarding the online DAX course as well as https://twitter.com/marcorus and https://twitter.com/ferrarialberto on Twitter!

Creating calendar tables with DAX using Power BI

By Nick Edwards

In this blog post we’ll take a quick look at creating a self-generating calendar table using DAX.

Dates are important if we want to perform time intelligence reporting on our data i.e. yearly sales, monthly sales, weekly sales, year to date sales or previous year sales.

We’ll be using the calendar function to create our date table, but there are other methods to do this such as CALENDARAUTO or GENERATESERIES.

Here is the syntax we’ll be using to generate our date table.

CALENDAR(<start_date>,<end_date>)

The calendar function returns a single column called “Date” which generates a continuous series of dates from the specified start date to the specified end date. So if we specified the start date to be the 01/01/2020 and the end date to be 31/12/2020 the function would generate 366 rows of distinct date data.

We can then use the add columns function to expand our calendar table further to add specific columns we wish to slice our data by i.e. year, month, quarter, week number, day…

Here is the syntax we will be using to expand our calender table.

ADDCOLUMNS(<table>,<name>,<expression>,[<name>,<expression>]…)

Let’s use an example to further explore how these functions work in practice using some sample adventure works sales data. Here we use the “Get Data” icon to directly query our adventure works database and bring over a sample of sales data, named “AW_Sales”.

Now, to create our calendar table we need to click “New Table” in the modeling tab and enter the following function.

Note: I like to use variables in my date table just to keep the DAX looking clean and fuss free, but this isn’t necessary.

For the start date parameter we have used the FIRSTDATE() and for the end Date parameter we have used the LASTDATE() function. This is so we can extract the first and last “OrderDate” from our AW_Sales table. We could have also used the MIN() and MAX() function to deliver the same results using the newly created “Date” column. As we can see this has generated a sequential date list from the 01/01/2012 to the 31/12/2013 with 731 distinct rows of data.

Now we want to expand our calendar table using the newly created “Date” column with new columns which can slice and dice our data. To do this I use the ADDCOLUMNS function. Here I have added Year, Quarter, Year Month, Month Number, Month Name, Day Of Year, Day Of Month, Day Of Week and Day Name as columns.

Once we’ve done this, we will mark our newly created table as a date table to allow Power BI to recognize date hierarchies and time intelligence functions.

We can now view our newly generated calender table in the data view. Instantly when any new data enters our model from “AW_Sales”, the calendar table will expand accordingly due to the last date function used above.

Now it’s just a case of creating a one to many relationship between our new calendar table and our “AW_Sales” table. We’ll create a one to many relationship between “Calendar[Date]” and “AW_Sales[OrderDate]” as shown below.

Congratulations we have now created a fully fledged calendar table that can slice and dice our “AW_Sales” tale by any of the columns we have created in our calendar table, as shown in the example below.

Power BI Sentinel
The Frog Blog

Team Purple Frog specialise in designing and implementing Microsoft Data Analytics solutions, including Data Warehouses, Cubes, SQL Server, SSIS, ADF, SSAS, Power BI, MDX, DAX, Machine Learning and more.

This is a collection of thoughts, ramblings and ideas that we think would be useful to share.

Authors:

Alex Whittles
(MVP)
Reiss McSporran
Jeet Kainth
Jon Fletcher
Nick Edwards
Joe Billingham
Microsoft Gold Partner

Data Platform MVP

Power BI Sentinel
Frog Blog Out
twitter
rssicon