So you’ve started using Synapse SQL Pools, you’ve scaled up to improve performance, but your queries aren’t going as fast as you’d hoped. You take a look at resource utilisation and you see that as you scale up, your total resource utilisation per query goes down. But didn’t we scale up to allow our query to use more resource and run quicker? What’s going on here? A little thing called Workload Management.
What is Workload Management? At a very high level, Workload Management allows you to control how your workload uses system resources. The main problem here is defaults!
If you navigate to the Workload Management tab of your SQL Pool and click the “User-defined workload groups : All” button you’ll be able to select the “System-defined workload groups” option.
This will allow you to see the pre-created groups and the relevant stats for what resources they’ll utilise.
Great! So what’s the problem? smallrc is the problem. smallrc is the default Workload Group used for all queries against your SQL Pool. If you look at the chart above, you’ll see that the minimum and maximum resources allocated per query in the Effective Values part of the table show 25.00. This is 25% of the total resources available. This means a single query will never be able to utilise the full performance of the SQL Pool. When we scale, this gets worse. The number in brackets will be the new value. For now we’ll focus on smallrc and xlargerc.
To test this is doing what we expect, I started with my SQL Pool at DW100, then DW500 then DW3000. I ran a single query without storing query results, under each of the xlargerc and then smallrc Workload Groups. The results below show the total utilised resources for each of these queries. As you scale up you can see the disparity in performance can be quite alarming.
The performance of these queries returned a resultset over a set period of time, the larger the returned resultset, the better the performance. smallrc increased slightly with each scaling, whereas the xlargerc increased massively. This graph above and the resulting data I received, show that scaling without changing Workload Management settings will not give you the performance you’re hoping for.
So now we’ve proven the premise and shown the results, how do we actually get this to work? You have two main options, first is to assign a user or role to one of the existing system defined workload groups, (which is what I did for the tests above), second is to create a custom workload group and assign a user or role to this instead.
Part 2 of this blog post goes into more detail on how to configure workload management and the details of the different options and is available: Here. Additional reading can be done on the Microsoft website: Here
Any questions? Let me know in the comments below or contact me on twitter: @PurpleFrogReiss
About the workload management. Have you guy ever tested the CPU alocation (or just the real performance) when assigning smallrc or xlargerc to a user?
Because basing on what I have tested, the workload management can only control the memory allocation and can not control the CPU usage. Resulting that if a query is CPU oriented, the workload management has no use on it.
I can also see on your image is that, the width of the (time takes to complete a query) of both smallrc and xlargerc is the same for case 100cc and 3000cc.
So am I missing something here?
Thanks for the questions!
Firstly, CPU and Memory are both joined together within Synapse under the DWU banner: https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/what-is-a-data-warehouse-unit-dwu-cdwu
You should find that increasing the number of DWUs available, (by scaling up), you should also increase the amount of CPU power that you have available. This is where workload management then comes in, as it is a percentage of the DWU resources that are applied to each query, not a specific number of CPUs or GB of RAM. Continue testing once you have workload management in place above “smallrc”, with your CPU intensive query and you should see better results 🙂
In terms of the time to complete queries in the graphs, these were purely to illustrate the amount of resource utilisation being used by each group rather than a change in runtime for performance. For these queries I let each of them run for a set amount of time before cancelling and re-scaling to best show the differences. In real terms, you will get a mostly linear progression in performance when the following three are lined up: A well structured query that is bottlenecked by system resource only, Scaling up to increase available DWUs, Correct workload management applied to allow usage of the DWUs.
Good Luck, Reiss