0845 643 64 63

Jeet Kainth

ADF Dataflow CTE workaround

At the time of writing, it is not possible to write a query using a CTE in the source of a dataflow. However, there are a few options to deal with this limitation:

  • re-write the query using subqueries instead of CTEs
  • use a stored procedure that contains the query and reference the stored proc in the source of the dataflow
  • write the query as a view and reference the view in the source of the dataflow (this is my preferred method and the one I will demo here)

I will use the following query purely as an example to demo this:

This query produces the following output:

If I write this query directly inside the source of the data flow as shown below, when trying to import the schema or preview the data I get the error message Incorrect syntax near the keyword ‘WITH’

However, if I create a view for this query and reference the view in the dataflow instead, this works and I can preview the data:

Note, in the source you can also write a query referencing the view (shown below), this is useful if you require additional logic on top of the view.

As mentioned earlier, you can also write the query in a stored procedure and reference this in a similar way to the above. You should now be able to use this simple method to use CTEs in the source for a dataflow.

Execute SQL Task in ADF

If you’re new to ADF or Synapse pipelines and looking for a way to execute SQL commands in pipelines, unfortunately there is no equivalent to the Execute SQL Task in SSIS, but I have found some alternatives which work well for me.

For something as simple as a SELECT statement where you want to return results to be used later, you can use a Lookup activity:

You can then reference the output of this activity in other activities using dynamic content such as @activity(‘LookUp’).output.firstRow.DateKey.

Although you can also use the Lookup activity for DML commands, it is not designed for this and so I’ve found another way to do this. First, create the following stored procedure in the database:

This stored procedure will execute the SQL command passed in via the @sql parameter. Now you can use the Stored procedure activity to point this newly created stored procedure and pass in a SQL command as shown here:

This activity will now execute the SQL command stated in the parameter value. Using this approach helps keep things consistent across your pipelines for SQL commands as they all use the Stored procedure activity, and the only difference is the SQL command used in the parameters section.

You can also add dynamic content to dynamically build up the SQL command that is passed into the stored procedure, this can allow you to use the same configuration for multiple Stored procedure activities but with different outcomes. For example if you had pipeline variables, the above TRUNCATE TABLE command could be replaced with @concat(‘TRUNCATE TABLE ‘,variables(‘SchemaName’),’.’,variables(‘TableName’)) and this would truncate a different table per pipeline based on the variable values.

Note: there is a limit on the number of characters the parameter value can take, so if you have a large SQL command it is best to create a stored procedure specifically for that command and to reference that stored procedure in the activity.

Pretty simple in the end, I hope this helps!

Azure Storage Backup Retention

This blog is a follow up to a previous blog I wrote about backing up Azure Analysis Services cubes in Azure, that blog can be found here. This blog shows how to implement a retention policy using PowerShell in Azure Runbooks to remove the backups after a set number of days. To create a new Runbook in the Azure portal, go to the relevant Automation account in the relevant resource group and then select Runbooks from the left hand pane. Note you will need to add the Az.Storage module to the automation account to be able to use some of the commands listed in this blog.

The parameters required here are shown below, where ResourceGroupName is the name of the resource group that has the relevant storage account, StorageAccountName is the name of the storage account resource and ContainerName is the name of the container holding the backup files to be removed. BlobName is part of the name of the files and will be used later to obtain only the relevant files from the storage container. RetentionDays specifies the number of days to keep a backup before it is removed.

The command Set-AzCurrentStorageAccount sets the storage account to be used for the subsequent commands, which gets the container using the ContainerName parameter and then lists the relevant files in the blob container using the BlobName parameter.

Once the list of backup files has been obtained, each file can be checked in turn and removed if it meets the relevant criteria. For each file obtained, the LastModified property is used to identify when the file was created. If the resulting date time is older than the number of retention days from today, the Remove-AzStorageBlob command is used to remove that file from the storage container. This check is repeated for all files, removing the relevant files, and leaving only those that are not older than the RetentionDays parameter.

This is all you need to start cleaning up the backup files in Azure, and although I have used Azure Analysis Services backups for my example here, this code can be used for any blob files in Azure.

Variables in DAX

Variables can simplify your DAX code, help with debugging and help improve performance. To use variables in your DAX measure you need to declare a variable using the VAR keyword, give the variable a name, and then assign an expression to the variable. You can use multiple variables in a measure but when using variables you must use a RETURN statement to specify the final output to be returned.

Variables syntax

To show how to make use of variables I’ll go through a simple measure and rewrite it using variables, I’ll be using the Adventure Works Internet Sales tabular cube for this.

A measure called Total Sales YoY Growth % (shown below) has been created which calculates sales for current year and the previous year, then works out the percentage increase/decrease from the previous year to the current year.

The example measure to be rewritten

You can see there is repeated logic for the CALCULATE block and this means it is evaluated twice, notice that SUM(‘Internet Sales'[Sales Amount]) is referenced 3 times. Using variables for the repeated logic gives the following code:

The original measure rewritten using variables

In this version the result of SUM(‘Internet Sales'[Sales Amount]) has been assigned to the variable TotalSales and the result of the CALCULATE section has been assigned to the variable TotalSalesPP. The CALCULATE section is now evaluated only once and assigned to a variable, this is a basic example so performance gain is insignificant but this method will help the performance of more complex measures.

To simplify the code further a new variable called TotalSalesVariance is created, and a final variable called Result is created to store the final calculation to be returned. The final code then becomes:

The final version with additional variables

This version then allows you to debug the measure by using any of the variables in the RETURN statement, making it possible to verify the values for each of the variables:

Variations of the same measure returning different variables for debugging

The one thing to look out for with variables is filter context, once a variable has been declared and a value assigned to it you cannot use a CALCULATE statement to override the filter context and get a new value using the variable.

For example, TotalSalesPP = CALCULATE(TotalSales, PARALLELPERIOD(‘Date'[Date], -12, MONTH)) would return the same value as the variable TotalSales, hence TotalSalesPP = CALCULATE(SUM(‘Internet Sales'[Sales Amount]), PARALLELPERIOD(‘Date'[Date], -12, MONTH)) is the correct way to write this.

Hopefully this blog will help you introduce variables into your DAX code!

My DAX Training Evaluation

Being fairly new to DAX and having recently completed some DAX training I thought it would be good to briefly share my experiences of the courses I have completed. The two courses I have recently done are:

Stairway to DAX is basically a series of posts, each covering one or two DAX functions, and is free to access. The Mastering Dax course is a paid video course covering various DAX functions and concepts, priced at $349.

The Stairway series was the first structured training material I did on DAX. Each post is detailed and contains a step-by-step guide on how to achieve the goal for that post, and there are practice sections to put the learning into context. Although the content and examples are based around PowerPivot it still covers the DAX concepts quite well, and new content is added regularly with the latest post on ALLSELECTED() added in October 2020.

I felt there were some good examples in the SUMX() section (Level 6), particularly when combined with RELATED(), I had not used SUMX() a great deal prior to the training so I found this section really useful. There were some sections that I felt were a bit long-winded and I found the Microsoft documentation on DAX (https://docs.microsoft.com/en-us/dax/dax-function-reference) covered the same content in a simpler way in my opinion.

The Mastering DAX video course is very well structured and has a good flow from simple concepts to more advanced techniques, with 21 modules at the time of writing. There is also an introductory course on the same website called Introducing DAX which is also a video course and is free to access, this is good as an intro into DAX before going into the more in-depth paid course so is definitely recommended.

What makes the video course even better is the exercises at the end of the modules, I found the exercises were a good way to get me thinking about what I had learnt and then apply that learning. There are some exercises where it is not obvious what the end goal is, and some exercises are a carbon copy from the video content so if you do the exercise straight after the videos then you are already likely to know the solution without much thought. More videos have been added recently where Alberto runs through the exercises and provides the solutions, this is really good if you don’t understand the exercise or need some guidance or an explanation on the solution.

I found both courses to be quite useful, although the video course suited my learning style better, especially with the exercises because I learn from doing rather than just reading. So, I feel got more out of the video course than the Stairway series and I still go back to it sometimes if I need some help with a DAX problem, along with the Microsoft documentation (link provided above) which is also a good reference point for DAX.

Based on your learning style you may learn more from one course or the other, so I’d recommend either depending on your learning style, or both if you want to mix things up. I’d also recommend supplementing these or any other courses with trying things yourself and just playing around with DAX, in my opinion the best way to learn is to just try it.

Next up for me is the video series on YouTube called Elements of DAX by Brian Grant!

Azure Analysis Services Cube Backup

This blog is a quick guide on how to back up an Azure Analysis Services cube using PowerShell. The PowerShell is used in a runbook inside an Automation account in Azure to automate the backup process. A pre-requisite for this is to configure the backup storage settings, this details the location of the backups.

Two parameters are passed into the runbook (shown below), where AnalysisServerName is the server name for the analysis services cube and AutomationCredentialName is the name of the credential for the Automation account in Azure.

The following variable is used to store the current date, obtained using the Get-Date cmdlet. This will be used as part of the backup name.

The command Get-AzureAutomationCredential is used to get the credentials for the Automation account provided, in this case it is one of the parameters provided for the runbook. The result is saved to a variable that will be used as a parameter for the backup command.

And finally, the backup command completes the backup of the analysis services cube to a file. Several parameters are used in this command: -BackupFile is the name given to the backup file, -Name is the name of the cube being backed up, -Server is the server name (passed in as a parameter), -Credential is the credentials obtained in the previous step, and -ApplyCompression enables the backup with compression.

I hope you have found this useful! The follow up to this blog about cleaning up the backup files can be found here.

My experience in obtaining an MCSE

Background

I have recently completed my MCSE in Data Management and Analytics, and I wanted to share my experience of working towards and passing the exams that have led to me getting this MCSE. In this post I will cover some of the challenges I faced and provide some tips and advice, hopefully others following the same certification path, or other paths, will find this useful.

I am a business intelligence developer at Purple Frog, I have several years of experience working with data, I started off as a data analyst and then went into reporting and more recently have been working on ETLs, data warehousing and cubes. I have been working with SQL throughout my various roles and therefore the best place for me to start my certification path was with the exam 70-761: Querying Data with Transact-SQL. The path I’ve taken up to this point is:

MCSA: SQL 2016 Database Development
– 70-761: Querying Data with Transact-SQL
– 70-762: Developing SQL Databases

MCSE: Data Management & Analytics
– 70-767: Implementing a SQL Data Warehouse (from the Elective Exam Pool)

The learning material

Although the first exam (70-761) fitted in quite well with my SQL background (I probably knew about 75% of the material beforehand), there was still some work required for me to get to the stage where I felt I was confident in passing the exam. For me, the best resource and my primary resource for learning the material has been the Exam Ref books, so for example “Exam Ref 70-761 Querying Data with Transact-SQL”. These books are structured in a way that the content is split into the sections covered by the exam, for example the books contain a chapter for every skill covered in the exam.

The one downside to the Exam Ref books is that at times it can feel quite wordy if you’re relying on the book alone, so what I found really useful was to supplement this with videos and demos on the topics where I needed a greater understanding. In addition to this, practice and doing exercises helped me to further understand the different concepts as I was able to try what I had learnt and see where I was going wrong.

The final resource that I found useful was Microsoft Docs (https://docs.microsoft.com/en-us/), this is a really good reference point for key facts, for example I found the page on CDC really useful for my latest exam (70-767).

The exam

There are the obvious tips such as sleep early the night before, get to the exam centre with time to spare and so on, but I wanted to share some of the exam techniques I found useful while doing these exams.

My top tip is check that you have covered off and are comfortable with all the skills measured in the exam, the skills measured for each exam can be found in the “Browse Certifications and Exams” section on the Microsoft Learning website (example shown below for exam 70-761). The skills are also stated in the Exam Ref books and as mentioned before the chapters in the book are based on the skills measured in the exam.

This was taken from https://docs.microsoft.com/en-us/learn/certifications/exams/70-761

What’s useful about the skills measured shown above is that it shows the weight of questions per skill in the exam. This is useful because you can work out if you need to focus on a weaker area if that area is a big part of the exam.

Time shouldn’t be an issue in the exam if you’ve prepared well, however some questions are not worded in the best way and can catch you out so do take the time to read each question properly, and do keep an eye on the time remaining after every 5-10 questions.

You have the option to flag questions and review them again later (note some questions cannot be flagged), make use of these flags for questions you are unsure of. This can be particularly useful if you’ve flagged a question and then a later question gives you a clue or reminds you of the answer for the question flagged earlier. Alternatively, you should be provided with a pen and wipeable board where you can make notes so note down the question number and topic so that you can come back to it later.

Next steps

I am currently studying towards the exam 70-768: Developing SQL Data Models, this will help develop my understanding and knowledge of data modelling and working with cubes and will also help me get the certification for MCSA: SQL 2016 BI Development. With these current certifications being retired in the near future the next plan is to work towards the certification Microsoft Certified: Azure Data Engineer Associate.

I hope you have found this blog useful and that you can use some of the tips mentioned in your study plans, all the best!

SSAS Tabular Calculation Groups – avoid using SELECTEDMEASURE or ISSELECTEDMEASURE

Introduction:

There is a very serious limitation in the behaviour of calculation groups when using the SELECTEDMEASURE or ISSELECTEDMEASURE functions, and we recommend not using them. Why? If a user creates their own custom calculations within their Power BI report (or composite model) then the value of SELECTEDMEASURE changes, breaking your calculation group logic.

Let me explain with an example:

In a tabular cube we may look to use calculation groups to apply additional filters when calculating measures. In our example we will be using a retail scenario, with budgets as well as ‘Like for Like’ (LFL) budgets. For those not in retail, LFL represents stores that were also open this time last year. We start with the measure “Budget” with the following definition:

We also have a hidden measure called “Budget LFL” with the following definition:

The measure “Budget LFL” is a variation of the “Budget” measure with additional filters applied at source. We also have a calculation group called “LFL Filter”, this is used to return the hidden “Budget LFL” measure based on the option selected in this filter and has the following definition:

This functionality is demonstrated below where the two visuals from Power BI show the same measure with and without the LFL Filter applied:

No LFL filter applied
LFL filter applied

Problem:

A problem arises when you try to create a custom measure (in the cube or within a Power BI report) and filter by the calculation group, for example we create a measure called “CustomBudget” which is a copy of the “Budget” measure and has the definition:

Adding this custom measure to the visual shown earlier we can see that the calculation group “LFL Filter” has no affect on the custom measure:

LFL filter applied

This is because the SELECTEDMEASURE() is now [CustomBudget] and not [Budget], therefore the logic in the calculation group doesn’t recognise the selected measure, and therefore doesn’t switch to the LFL measure.

Workaround:

To get around this we move the bulk of the logic from the calculation group to the measure itself. The measure now changes its behaviour by looking at the selected value of the LFL filter, instead of the calculation group managing it centrally:

This is the new definition for the “Budget” measure
This is the new definition for the “LFL Filter” calculation group

We refresh Power BI and find that the results are now as expected, the original and custom measures now match when using the calculation group as a filter:

This is the same visual but using the new logic, now with matching results

Thank you to Darren Gosbell (Twitter | Blog) for the suggestion of this workaround.

Cube deployment error: Cannot resolve all paths while de-serializing Database

I recently came across the following error while deploying a tabular cube using the Analysis Services Deployment Wizard:

My updates for this deployment included removing a table from the cube. As part of the cube deployment options I chose to “Retain roles and members”. However the issue here was that an existing role was referencing the table I had deleted, for example the below image shows the Role Manager in Visual Studio and shows that the Cube_Finance role references the “Transaction Type” table.

To resolve this error I simply had to deploy the cube using the option “Deploy roles and retain members”:

This resulted in a successful deployment with the role updated to exclude any references to the deleted table:

Note this error can also occur if a table has been renamed, for the same reason explained above where a role references the “old” table name. The solution is the same: deploy roles and retain members.

SQL LAST_VALUE() wrong results

You may have come across the following scenario when using the LAST_VALUE() function.

You want to find the first or last value loaded in a result set, therefore you use the FIRST_VALUE() and LAST_VALUE() functions like below:

You expect to get “A” for every record in the FirstValue column and “E” for every record in the LastValue column.
However you see the following output:

LAST_VALUE() on its own is implemented from the current row going back to the first row, for example:

  • Row 1 – The only value in scope is A.
    Hence, FIRST_VALUE() & LAST_VALUE() both return A.
  • Row 2 – The values now in scope are A & B.
    Hence, FIRST_VALUE() returns A and LAST_VALUE() returns B.
  • Row 3 – The values now in scope are A, B, C.
    Hence, FIRST_VALUE() returns A and LAST_VALUE() returns C.
  • And so on…

To return the actual last value, add the additional clause ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING. This ensures that for each row the LAST_VALUE() function looks at all rows; from the very first row to the very last row. Now, you have the following code (the original code with an extra column using the additional clause):

This gives you the following output, LastValue2 shows the true last value for the result set:

Power BI Sentinel
The Frog Blog

Team Purple Frog specialise in designing and implementing Microsoft Data Analytics solutions, including Data Warehouses, Cubes, SQL Server, SSIS, ADF, SSAS, Power BI, MDX, DAX, Machine Learning and more.

This is a collection of thoughts, ramblings and ideas that we think would be useful to share.

Authors:

Alex Whittles
(MVP)
Jeet Kainth
Jon Fletcher
Nick Edwards
Joe Billingham
Lewis Prince
Reiss McSporran
Microsoft Gold Partner

Data Platform MVP

Power BI Sentinel
Frog Blog Out
twitter
rssicon