Monthly Archives: June 2021
In this blog post I will show you how to join two 2D Python lists together.
The code is in the screenshot below.
Lines 1 – 2 are two lists that are going to be joined, line 3 is an empty list where the output will be appended to.
Lines 4 – 5 are two loops (one nested inside the other) which cycle through the records in both lists, line 6 checks whether the first items (index 0) in the two records from each list that are currently i & j match. If yes: the Key, Capital and Country are appended to our new list.
Lines 9 – 10 show the output record by record showing the join has worked successfully.
This code can be expanded for 3 lists, the code would have a 3rd for loop and an extra check in the if statement to find the correct record in the 3rd list to join.
Joining lists together in Python is useful when there is data in different lists and it would be beneficial if it were combined.
Have you ever wanted to export a table from Power BI Desktop into Excel just to make sure the DAX you’ve written is performing as expected but ran into this error message: “Data exceeds the limit – Your data is too large. Some data sampling may occur”?
Most probably this has occurred because you’ve got more than 30,000 rows of data in your table that you’re trying to export. In the example shown below I’ve actually got 30,001 rows of data – a row containing column headers plus 30,000 rows of actual data.
If I now change the row count in my “Table To Export” table to 29,999 rows of actual data using a simple filter I have no issues exporting the data. This brings the grand total to 30,000 rows (29,999 rows of actual data plus a row containing columns headers).
But what if you really need to export more than 30,000 rows of data? Well there is a way…
Firstly make sure you’ve got DAX Studio installed on your machine.
Now within Power BI Desktop click “View” and then “Performance analyzer”.
This will open a new window pane called “Performance analyzer”.
Now click “Start Recording” and then click “Refresh Visuals”. Next within the Performance analyzer window locate the name of the visualisation which in my case is “Table To Export” and click the small “+” symbol next to it.
Next go ahead and grab the DAX query which has been used to construct the table. Click on the link “Copy query” which will copy the DAX query to your clipboard.
Next you’ll need to launch DAX Studio from within Power BI Desktop.
To do this on the top ribbon in Power BI Desktop click “External Tools” and then click “DAX Studio”.
Now within DAX Studio paste the copied query from your clipboard into the query window.
Next remove the variable which limits the number of rows which will be returned (TOPN) and replace the “EVALUATE” statement with “_DS0Core” as per the snip below.
Next change the output of the query to be executed from “Grid” to “File”.
Finally click the “Run” button and let DAX Studio Export your data containing at least 30,000 rows to a CSV file. Simple!
I hope this helps with any debugging you might need to perform in Power BI Desktop!
So you’ve started using Synapse SQL Pools, you’ve scaled up to improve performance, but your queries aren’t going as fast as you’d hoped. You take a look at resource utilisation and you see that as you scale up, your total resource utilisation per query goes down. But didn’t we scale up to allow our query to use more resource and run quicker? What’s going on here? A little thing called Workload Management.
What is Workload Management? At a very high level, Workload Management allows you to control how your workload uses system resources. The main problem here is defaults!
If you navigate to the Workload Management tab of your SQL Pool and click the “User-defined workload groups : All” button you’ll be able to select the “System-defined workload groups” option.
This will allow you to see the pre-created groups and the relevant stats for what resources they’ll utilise.
Great! So what’s the problem? smallrc is the problem. smallrc is the default Workload Group used for all queries against your SQL Pool. If you look at the chart above, you’ll see that the minimum and maximum resources allocated per query in the Effective Values part of the table show 25.00. This is 25% of the total resources available. This means a single query will never be able to utilise the full performance of the SQL Pool. When we scale, this gets worse. The number in brackets will be the new value. For now we’ll focus on smallrc and xlargerc.
To test this is doing what we expect, I started with my SQL Pool at DW100, then DW500 then DW3000. I ran a single query without storing query results, under each of the xlargerc and then smallrc Workload Groups. The results below show the total utilised resources for each of these queries. As you scale up you can see the disparity in performance can be quite alarming.
The performance of these queries returned a resultset over a set period of time, the larger the returned resultset, the better the performance. smallrc increased slightly with each scaling, whereas the xlargerc increased massively. This graph above and the resulting data I received, show that scaling without changing Workload Management settings will not give you the performance you’re hoping for.
So now we’ve proven the premise and shown the results, how do we actually get this to work? You have two main options, first is to assign a user or role to one of the existing system defined workload groups, (which is what I did for the tests above), second is to create a custom workload group and assign a user or role to this instead.
Part 2 of this blog post goes into more detail on how to configure workload management and the details of the different options and is available: Here. Additional reading can be done on the Microsoft website: Here
Any questions? Let me know in the comments below or contact me on twitter: @PurpleFrogReiss