I have created a Power BI report which provides detail on the state of processing in a Tabular Cube.
The report uses the cube’s dynamic management views to provide information about the cube’s partitions.
To use the tabular cube processing report, you need to insert the following information:
- Database (Cube Name)
Once entered and assuming the connection is fine you need to accept some native queries. These statements are select statements and will not alter the cube. That should give you a report similar to the one below, I have used an adventure works tabular model as an example.
This report is inspired by one I found for general tabular cube documentation by Data Savvy, which I highly recommend. That report includes information on: tables, columns, relationships, measure and security:
A few weeks ago, Purple Frog attended SQLBits 2020 (the largest data conference in Europe). This year the event had a change of pace as the whole conference was hosted online due to the current global pandemic. We converted the office into our own small venue with almost every screen being used over the week to display sessions. With the help of plenty of social distancing, plenty of pizza and the odd bacon sandwich… it worked quite well!
Here is what the frogs had to say about the event,
SQLBits 2020 had literally hundreds of sessions to choose from. It was a tough call which session to watch next in most cases! Sessions ranged from hardcore data related sessions, to softer creative/design/career sessions too. I enjoyed seeing sessions from big names such as Christian Wade, Chris Webb, Brent Ozar and not forgetting the Italian DAX experts Marco & Alberto! I also gained lots of insight from Paul Andrew’s (a Purple Frog alumni) training day session on Azure Data Factory and gained lots of hints and tips that we could apply to live projects that I’m working on.
As some of the sessions were pre-recorded, the presenter was on hand to answer any questions along the way using the interactive chat window which was super useful and interactive. Finally, the organisers hosted a pub quiz Thursday night which was a great laugh as well as hosting lunch time entertainment on Thursday & Friday from the guys from the ‘Festival of the Spoken Nerd‘! All in all a fantastic week of remote training and insight! Be sure to get yourself booked on for next year! – Nick Edwards (BI Developer)
I had the pleasure of attending SQLBits 2019 last year, to see such a large event be totally migrated online was rather impressive. While the lunchtime entertainment and the pub quiz were a lot of fun with Steve Mould having a comedy routine on Venn diagrams, my highlight was attending Andy Leonard’s “From Zero to Azure Data Factory” training day. Andy is a great presenter and very knowledgeable about this area, there were many versatile examples he provided during the day that I found very useful. While I would have preferred to attend a live event, given the circumstances I feel SQLBits 2020 was a success and has really set the mark for what a virtual conference can achieve.- Liam McGrath (Support Engineer)
SQLBits was very different this year! The atmosphere of a virtual room, compared to an in person event is never going to compete, but that doesn’t mean that the quality or content was any lesser for it! In fact the ability to ask questions directly to the speakers via the chat windows or Q&A during the pre-recorded presentation, allowed them to fully answer your question in depth and have a good conversation around it. Sitting in on sessions with Industry giants, Itzik Ben-Gan, Brent Ozar, Pinal Dave and Simon Whitely and Terry McCann from Advancing Analytics was incredibly valuable as expected.
There were loads of incredibly thought provoking sessions, ranging from Machine Learning in Power BI, to Databricks vs Synapse. Unlocking your LinkedIn profile, to T-SQL Performance Tuning. If you were any way interested in Data, there was something for you to learn! Overall I had a great time, and I can’t wait to see what the wonderful team of volunteers and speakers come up with next year! Bring on SQLBits 2021! – Reiss McSporran (BI Developer)
I found the conference really useful and quite a fun way of learning. I got some really useful knowledge out of the sessions and got to see different ways of doing things. The range of sessions is really good and there are so many options, you get a good mix of topics and therefore aren’t overwhelmed with information. The lunchtime entertainment and quiz were a really fun addition, so it’s not just all about the learning. – Jeet Kainth (BI Developer)
SQLBits 2020 was the first SQLBits event that I have attended. I enjoyed lots of different sessions on numerous topics such as: DAX, Power BI, Azure Machine Learning, SQL Server and even one about brand building.
The sessions were very informative and I learnt a lot. I would definitely attend again even if the event was to be virtual. – Jon Fletcher (BI Developer)
Having attended many SQLbits conferences over the years, I was excited to be accepted to deliver a talk this year on LinkedIn. As the conference was virtual, we have to pre-record our sessions but this meant that whilst the recordings were playing we were able to directly interact with the delegates in the chat to answer questions. It was great to see a soft skills topic as part of the SQLBits agenda and given the positive response I hope that we start to see more of these sessions scheduled at future events. Also, it was funny that I had slightly more people in my session than Alex Whittles! – Hollie Whittles (Speaker)
In this blog post, I will run through 3 different ways to execute a delayed Python loop. In these examples, we will aim to run the loop once every minute.
To show the delay, we will print out the current datetime using the datetime module.
1 – Sleep
The sleep function from Python’s time module pauses the Python execution by the number of seconds inputted. The example below pauses the script for 60 seconds.
The above script has a 60 second delay between the end of a loop and the start of the next. However, what if we wanted to execute the next loop 60 seconds after the start of the previous loop.
In other words, how do we start the loop at the same time every minute.
2 – Calculate Sleep
To do this we calculate how long the loop should sleep for.
We will print the datetime every time the minutes changes. At the start of the loop we lookup the number of seconds that have passed so far this minute.
The number of seconds that have passed this minute is calculated from date_time[-2:]. Subtracting this from 60 gives the length of time in seconds for which the loop should sleep for, to execute when the next minute starts.
Once the loop has slept for the required number of seconds, we lookup the datetime again and print it out.
3 – Task Scheduler
The previous two options are good for executing a loop a few times, ten in our case. If we wanted to execute a python script continuously without expiring, we could use the above examples with an infinite appending loop.
However, if one loop errors the script will stop. Therefore, we want to execute the entire Python script once a minute using an external trigger. This is where we can use Task Scheduler.
Task Scheduler can execute a python script from source but it is often easier to use a batch file. The batch file includes the location of the python application (python.exe) and the location of the python script (.py). For more detail on using Task Scheduler and batch files to run Python scripts, please see the following datatofish post – https://datatofish.com/python-script-windows-scheduler
Our batch file is:
To demonstrate Task Scheduler, I’m going to run the following Python code every minute.
This code uses Pandas to produce a blank CSV file, but the name of CSV file is the datetime the script was run.
These following screenshots show the triggers and actions used.
This produced the following CSV files, we can see that the files takes 1 – 4 seconds to create.
In summary we have seen three different ways to delay a Python loop, two using loops inside Python and one using Task Scheduler. All can be used depending on what kind of delay is best.
A common request that is raised by clients is how to sort a table in Power BI by multiple columns, in the same way you can in Excel.
For a while, there was no way (at least no easy way) to do this until the Power BI March 2020 update.
I learnt this tip from the following YouTube video:
Full credit to Dhruvin Shah, check his video out.
Below I have a Power BI table displaying fruit sales, currently unsorted.
To sort the table by Fruit click on the column header Fruit.
The table is now sorted by Fruit in alphabetical order.
To add a secondary sort on Sales, hold the Shift key and click on the column header Sales.
The table is now sorted by
– Fruit name in alphabetical order
– Sales in descending order
Some extras to note:
– There is no limit on the number of columns that can be used to sort a table. Just hold the shift key and keep choosing columns.
– This feature is not available for matrices.
– To switch the sorting from ascending to descending or vice-versa continue to hold shift and click on the column header again.
This post shows hows how you can capture and store the number of records inserted, updated or deleted from a T-SQL Merge statement.
This is in response to a question on an earlier post about using Merge to load SCDs in a Data Warehouse.
You can achieve this by using the OUTPUT clause of a merge statement, including the $Action column that OUTPUT returns.
The basic syntax is:
INSERT INTO XXXX SELECT [Action] FROM ( MERGE XXXX AS Target USING XXXX AS Source ON XXXX=XXXX WHEN MATCHED AND XXXX <> XXXX THEN UPDATE SET XXXX=XXXX WHEN NOT MATCHED THEN INSERT ( XXXX ) VALUES ( XXXX ) OUTPUT $action AS [Action] ) MergeOutput
You wrap the Merge statement up as a sub-query, adding the OUTPUT clause to return details about what happened in the merge. Note that you can’t just select from this sub-query, there has to be an INSERT INTO statement.
One row will be returned for each row touched by the merge process.
The $action column will contain either INSERT, UPDATE or DELETE, to indicate what happened to that row.
You can also include Source.* in order to include the source column values in the output dataset.
You can also include DELETED.*, which returns the values of any updated records before they were updated, and INSERTED.* to show the values after the updated. In reality the records are not deleted or inserted, just updated, but DELETED/INSERTED is used as the terminology for old/new values either side of the update. When inserting a new record, DELETED values will be NULL.
... OUTPUT $action AS [Action], Source.*, DELETED.*, INSERTED.* ) MergeOutput
You can then refer to this ‘MergeOutput’ result set at the top of the query by selecting from this sub-query.
There is a limitation, though, you can’t aggregate the table. So if we want to summarise the actions into a single row of insert, update and delete counts, we have to use a temporary table such as in the sample code below.
CREATE TABLE #MergeActions ([Action] VARCHAR(10)) INSERT INTO #MergeActions ([Action]) SELECT [Action] FROM ( MERGE [dim].[Date] AS Target USING [etl].[DimDate] AS Source ON ISNULL(Target.[DateKey],0) = ISNULL(Source.[DateKey],0) WHEN MATCHED AND (Target.[Date] <> Source.[Date] OR Target.[Month] <> Source.[Month] OR Target.[Quarter] <> Source.[Quarter] OR Target.[Year] <> Source.[Year] ) THEN UPDATE SET [Date] = Source.[Date] ,[Month] = Source.[Month] ,[Quarter] = Source.[Quarter] ,[Year] = Source.[Year] ,LastUpdated = GetDate() WHEN NOT MATCHED THEN INSERT ( [DateKey] ,[Date] ,[Month] ,[Quarter] ,[Year] ,LastUpdated ) VALUES ( Source.[DateKey] ,Source.[Date] ,Source.[Month] ,Source.[Quarter] ,Source.[Year] ,GetDate() ) OUTPUT $action AS [Action] ) MergeOutput ; SELECT SUM(CASE WHEN [Action]='INSERT' THEN 1 ELSE 0 END) AS InsertCount ,SUM(CASE WHEN[Action]='UPDATE' THEN 1 ELSE 0 END) AS UpdateCount ,SUM(CASE WHEN [Action]='DELETE' THEN 1 ELSE 0 END) AS DeleteCounts FROM #MergeActions GROUP BY [Action] DROP TABLE #MergeActions
We all know and love Power BI, because of it’s power and ease of use, there’s a good reason why it’s number 1 in the Gartner Quadrant. But how to we avoid ‘Power BI Hell’, the new version of our old nemesis ‘Excel Hell’?
How do we keep track of which data sources are being used by which datasets and reports? How to we perform an impact assessment of updating/changing a dataset? What happens if someone deletes a Power BI report? There has been no easy solution to this, until now.
Step forward a new SaaS tool, Power BI Sentinel
Sentinel is a new Software as a Service (i.e. hosted in the cloud) tool which sits alongside your PowerBI.com account, helping you manage things better. Core features include:
- Data Lineage documentation – visibility of which reports are using which datasets, and which data sources. Plus showing which downstream datasets and reports will be impacted by deleting or changing a data source or dataset
- Automated Backups – backing up your PBIX files directly to your own Azure storage. Allowing you to roll back, recover deleted files, and see what the report/data looked like in the past for debug/investigation
- Automated Dataset Documentation – Generate online or PDF documentation of your Power BI datasets, the model, tables, DAX measures, calculated columns, etc.
- Report Change Tracking – If someone’s broken a report by adding/removing a filter, changing a visual, etc. the change tracking will show you what has changed, and when, allowing you to roll back to the correct version prior to the change
The diagram above shows a sample Data Lineage diagram of a Power BI Dataset, showing which data sources (databases, files, websites, etc.) are used by the dataset, and which reports then consume this dataset.
Scenario 1: Power BI Governance, Compliance, GDPR
You have a data source, lets say an Oracle database, containing sensitive customer details (PII). When auditing your organisation’s usage of this data, which of your 600 Power BI reports do you need to review to assess whether that PII data is exposed or not, and to who?
The Power BI Data Lineage explorer allows you to select a datasource, and immediately see which datasets and reports consume data from that source, including those hidden away in users’ personal ‘My Workspace’ folders. Straight away refining the list of sensitive reports to a handful.
Scenario 2: Power BI Disaster Recovery
Your well behaved development teams obviously follow good DevOps practice for all their development work (hopefully!), but Power BI is a self service tool, allowing your business users to create their own reports. And you’re almost certainly not going to get business users using source control and good testing practices on their reports.
Self service is a very powerful feature which significantly enhances the benefit of Power BI to your organisation. But if ‘Mary from Finance’ creates a report that becomes business critical, if that report gets corrupted, broken, deleted or changed, there is no recovery or rollback to the previous version of that report.
Power BI Sentinel not only gives you visibility of whether these critical reports are hidden in personal workspaces, but also performs regular automated backups of the reports, allowing you to recover and roll back if required.
Power BI is a very powerful tool, but with great power comes great responsibility.
Power BI Sentinel is a great way of providing transparency over your Power BI tenant, and protecting you if anything should ever go wrong.
Sign up for a licence at PowerBISentinel.com
I think it’s about time for a good old rant. Nothing to do with Business Intelligence, but…
Anyone out there who uses Sky Anytime (and I believe the BBC iPlayer) will have probably noticed that they use a nasty peer-to-peer file sharing application called Kontiki, which they install automatically without giving you any control over. My problem is that I have a good 8Mb download speed but my upload is still a rather paltry 256K (I know, it’s almost embarrasing). I use Sky Anytime to download a film probably no more than once a month, the rest of the time Kontiki sits there on my PC hogging my precious upload bandwith. They don’t give you any control over restrict it or turning it off – hense my rant.
I’m happy to leave it running for a bit when it’s not causing a problem, but I want to easily be able to turn it off. So, the solution? Create a Kontiki-Killer batch file that disables it. When you next launch Sky Anytime it will automatically load again, but when you’re finished, you can run this batch file to kill the little blighter.
It kills the KHost.exe process, stops the KService service, sets the KService to start manually not automatically (so it will only start when you launch Sky Anytime), and removes KHost.exe from the startup options in the registry (Sky will add them back in when it is next launched).
I use this on my XP Pro SP2 box, I’ve not tried it on any other OS, so I can not accept any responsibility for any problems is may cause – use entirely at your own risk.
Copy this into a blank text file, and save it as KillKontiki.bat
@Echo off Echo. Echo. Echo Kontiki-Killer... Echo. - In progress... Echo. Echo Killing KHost.exe process... taskkill /IM KHost.exe /T /f Echo. Echo Stopping KService service... NET Stop KService Echo. Echo. Making the KService service on demand not on startup sc config KService start= demand Echo. Echo Removing from Registry startup.... REG DELETE HKLM\Software\Microsoft\Windows\CurrentVersion\Run /v kdx /f REG DELETE HKCU\Software\Microsoft\Windows\CurrentVersion\Run /v kdx /f Echo. Echo. Echo Kontiki blitz complete. Echo. Echo. (C) Purple Frog Systems Echo. pause
You can also download it here
Well, what can I say – the very first Frog-Blog post.
Keep checking back often for new and (hopefully) useful info…