0845 643 64 63

Uncategorized

Capturing Insert and Update Counts from Merge

This post shows hows how you can capture and store the number of records inserted, updated or deleted from a T-SQL Merge statement.

This is in response to a question on an earlier post about using Merge to load SCDs in a Data Warehouse.

You can achieve this by using the OUTPUT clause of a merge statement, including the $Action column that OUTPUT returns.

The basic syntax is:

 
INSERT INTO XXXX
SELECT [Action]
 FROM
 (
 MERGE      XXXX AS Target
      USING XXXX AS Source
         ON XXXX=XXXX
      WHEN MATCHED
         AND XXXX <> XXXX
     THEN UPDATE SET
         XXXX=XXXX
     WHEN NOT MATCHED THEN
           INSERT (
              XXXX
           ) VALUES (
              XXXX
           )
     OUTPUT $action AS [Action]
     ) MergeOutput

You wrap the Merge statement up as a sub-query, adding the OUTPUT clause to return details about what happened in the merge. Note that you can’t just select from this sub-query, there has to be an INSERT INTO statement.

One row will be returned for each row touched by the merge process.

The $action column will contain either INSERT, UPDATE or DELETE, to indicate what happened to that row.

You can also include Source.* in order to include the source column values in the output dataset.

You can also include DELETED.*, which returns the values of any updated records before they were updated, and INSERTED.* to show the values after the updated. In reality the records are not deleted or inserted, just updated, but DELETED/INSERTED is used as the terminology for old/new values either side of the update. When inserting a new record, DELETED values will be NULL.

 ... OUTPUT $action AS [Action], Source.*, DELETED.*, INSERTED.*
     ) MergeOutput 

You can then refer to this ‘MergeOutput’ result set at the top of the query by selecting from this sub-query.

There is a limitation, though, you can’t aggregate the table. So if we want to summarise the actions into a single row of insert, update and delete counts, we have to use a temporary table such as in the sample code below.

 
CREATE TABLE #MergeActions ([Action] VARCHAR(10)) 

 INSERT INTO #MergeActions ([Action])
 SELECT [Action]
 FROM
 (
 MERGE      [dim].[Date] AS Target
      USING [etl].[DimDate] AS Source
         ON ISNULL(Target.[DateKey],0) = ISNULL(Source.[DateKey],0)
      WHEN MATCHED
         AND (Target.[Date] <> Source.[Date]
          OR Target.[Month] <> Source.[Month]
          OR Target.[Quarter] <> Source.[Quarter]
          OR Target.[Year] <> Source.[Year]
         )
     THEN UPDATE SET
         [Date] = Source.[Date]
        ,[Month] = Source.[Month]
        ,[Quarter] = Source.[Quarter]
        ,[Year] = Source.[Year]
        ,LastUpdated = GetDate()
     WHEN NOT MATCHED THEN
           INSERT (
              [DateKey]
             ,[Date]
             ,[Month]
             ,[Quarter]
             ,[Year]
             ,LastUpdated
           ) VALUES (
              Source.[DateKey]
             ,Source.[Date]
             ,Source.[Month]
             ,Source.[Quarter]
             ,Source.[Year]
             ,GetDate()
           )
     OUTPUT $action AS [Action]
     ) MergeOutput
 ;

 SELECT
      SUM(CASE WHEN [Action]='INSERT' THEN 1 ELSE 0 END) AS InsertCount
     ,SUM(CASE WHEN[Action]='UPDATE' THEN 1 ELSE 0 END) AS UpdateCount
     ,SUM(CASE WHEN [Action]='DELETE' THEN 1 ELSE 0 END) AS DeleteCounts
 FROM #MergeActions
 GROUP BY [Action]

 DROP TABLE #MergeActions 

</Frog-Blog Out>

Power BI Sentinel

We all know and love Power BI, because of it’s power and ease of use, there’s a good reason why it’s number 1 in the Gartner Quadrant. But how to we avoid ‘Power BI Hell’, the new version of our old nemesis ‘Excel Hell’?

How do we keep track of which data sources are being used by which datasets and reports? How to we perform an impact assessment of updating/changing a dataset? What happens if someone deletes a Power BI report? There has been no easy solution to this, until now.

Step forward a new SaaS tool, Power BI Sentinel

Sentinel is a new Software as a Service (i.e. hosted in the cloud) tool which sits alongside your PowerBI.com account, helping you manage things better. Core features include:

  • Data Lineage documentation – visibility of which reports are using which datasets, and which data sources. Plus showing which downstream datasets and reports will be impacted by deleting or changing a data source or dataset
  • Automated Backups – backing up your PBIX files directly to your own Azure storage. Allowing you to roll back, recover deleted files, and see what the report/data looked like in the past for debug/investigation
  • Automated Dataset Documentation – Generate online or PDF documentation of your Power BI datasets, the model, tables, DAX measures, calculated columns, etc.
  • Report Change Tracking – If someone’s broken a report by adding/removing a filter, changing a visual, etc. the change tracking will show you what has changed, and when, allowing you to roll back to the correct version prior to the change
Visualise data lineage from your Power BI tenant

The diagram above shows a sample Data Lineage diagram of a Power BI Dataset, showing which data sources (databases, files, websites, etc.) are used by the dataset, and which reports then consume this dataset.

Scenario 1: Power BI Governance, Compliance, GDPR

You have a data source, lets say an Oracle database, containing sensitive customer details (PII). When auditing your organisation’s usage of this data, which of your 600 Power BI reports do you need to review to assess whether that PII data is exposed or not, and to who?

The Power BI Data Lineage explorer allows you to select a datasource, and immediately see which datasets and reports consume data from that source, including those hidden away in users’ personal ‘My Workspace’ folders. Straight away refining the list of sensitive reports to a handful.

Scenario 2: Power BI Disaster Recovery

Your well behaved development teams obviously follow good DevOps practice for all their development work (hopefully!), but Power BI is a self service tool, allowing your business users to create their own reports. And you’re almost certainly not going to get business users using source control and good testing practices on their reports.

Self service is a very powerful feature which significantly enhances the benefit of Power BI to your organisation. But if ‘Mary from Finance’ creates a report that becomes business critical, if that report gets corrupted, broken, deleted or changed, there is no recovery or rollback to the previous version of that report.

Power BI Sentinel not only gives you visibility of whether these critical reports are hidden in personal workspaces, but also performs regular automated backups of the reports, allowing you to recover and roll back if required.

Summary

Power BI is a very powerful tool, but with great power comes great responsibility.

Power BI Sentinel is a great way of providing transparency over your Power BI tenant, and protecting you if anything should ever go wrong.

Sign up for a licence at PowerBISentinel.com

Sky Anytime – Killing Kontiki

I think it’s about time for a good old rant. Nothing to do with Business Intelligence, but…

Anyone out there who uses Sky Anytime (and I believe the BBC iPlayer) will have probably noticed that they use a nasty peer-to-peer file sharing application called Kontiki, which they install automatically without giving you any control over. My problem is that I have a good 8Mb download speed but my upload is still a rather paltry 256K (I know, it’s almost embarrasing). I use Sky Anytime to download a film probably no more than once a month, the rest of the time Kontiki sits there on my PC hogging my precious upload bandwith. They don’t give you any control over restrict it or turning it off – hense my rant.

I’m happy to leave it running for a bit when it’s not causing a problem, but I want to easily be able to turn it off. So, the solution? Create a Kontiki-Killer batch file that disables it. When you next launch Sky Anytime it will automatically load again, but when you’re finished, you can run this batch file to kill the little blighter.

It kills the KHost.exe process, stops the KService service, sets the KService to start manually not automatically (so it will only start when you launch Sky Anytime), and removes KHost.exe from the startup options in the registry (Sky will add them back in when it is next launched).

I use this on my XP Pro SP2 box, I’ve not tried it on any other OS, so I can not accept any responsibility for any problems is may cause – use entirely at your own risk.
Copy this into a blank text file, and save it as KillKontiki.bat

@Echo off
Echo.
Echo.
Echo    Kontiki-Killer...
Echo.    - In progress...
Echo.
Echo    Killing KHost.exe process...
taskkill /IM KHost.exe /T /f
Echo.
Echo    Stopping KService service...
NET Stop KService
Echo.
Echo.   Making the KService service on demand not on startup
sc config KService start= demand
Echo.
Echo    Removing from Registry startup....
REG DELETE HKLM\Software\Microsoft\Windows\CurrentVersion\Run /v kdx /f
REG DELETE HKCU\Software\Microsoft\Windows\CurrentVersion\Run /v kdx /f
Echo.
Echo.
Echo    Kontiki blitz complete.
Echo.
Echo.                                        (C) Purple Frog Systems
Echo.
pause


You can also download it here

The Frog-Blog is open for business!

Well, what can I say – the very first Frog-Blog post.

Keep checking back often for new and (hopefully) useful info…

Alex

Power BI Sentinel
The Frog Blog

Team Purple Frog specialise in designing and implementing Microsoft Data Analytics solutions, including Data Warehouses, Cubes, SQL Server, SSIS, ADF, SSAS, Power BI, MDX, DAX, Machine Learning and more.

This is a collection of thoughts, ramblings and ideas that we think would be useful to share.

Authors:

Alex Whittles
(MVP)
Reiss McSporran
Jeet Kainth
Jon Fletcher

Data Platform MVP

Power BI Sentinel
Frog Blog Out
twitter
rssicon