0845 643 64 63

Analysis Services

1 2 3 5

Variables in DAX

Variables can simplify your DAX code, help with debugging and help improve performance. To use variables in your DAX measure you need to declare a variable using the VAR keyword, give the variable a name, and then assign an expression to the variable. You can use multiple variables in a measure but when using variables you must use a RETURN statement to specify the final output to be returned.

Variables syntax

To show how to make use of variables I’ll go through a simple measure and rewrite it using variables, I’ll be using the Adventure Works Internet Sales tabular cube for this.

A measure called Total Sales YoY Growth % (shown below) has been created which calculates sales for current year and the previous year, then works out the percentage increase/decrease from the previous year to the current year.

The example measure to be rewritten

You can see there is repeated logic for the CALCULATE block and this means it is evaluated twice, notice that SUM(‘Internet Sales'[Sales Amount]) is referenced 3 times. Using variables for the repeated logic gives the following code:

The original measure rewritten using variables

In this version the result of SUM(‘Internet Sales'[Sales Amount]) has been assigned to the variable TotalSales and the result of the CALCULATE section has been assigned to the variable TotalSalesPP. The CALCULATE section is now evaluated only once and assigned to a variable, this is a basic example so performance gain is insignificant but this method will help the performance of more complex measures.

To simplify the code further a new variable called TotalSalesVariance is created, and a final variable called Result is created to store the final calculation to be returned. The final code then becomes:

The final version with additional variables

This version then allows you to debug the measure by using any of the variables in the RETURN statement, making it possible to verify the values for each of the variables:

Variations of the same measure returning different variables for debugging

The one thing to look out for with variables is filter context, once a variable has been declared and a value assigned to it you cannot use a CALCULATE statement to override the filter context and get a new value using the variable.

For example, TotalSalesPP = CALCULATE(TotalSales, PARALLELPERIOD(‘Date'[Date], -12, MONTH)) would return the same value as the variable TotalSales, hence TotalSalesPP = CALCULATE(SUM(‘Internet Sales'[Sales Amount]), PARALLELPERIOD(‘Date'[Date], -12, MONTH)) is the correct way to write this.

Hopefully this blog will help you introduce variables into your DAX code!

Azure Analysis Services Cube Backup

This blog is a quick guide on how to back up an Azure Analysis Services cube using PowerShell. The PowerShell is used in a runbook inside an Automation account in Azure to automate the backup process. A pre-requisite for this is to configure the backup storage settings, this details the location of the backups.

Two parameters are passed into the runbook (shown below), where AnalysisServerName is the server name for the analysis services cube and AutomationCredentialName is the name of the credential for the Automation account in Azure.

The following variable is used to store the current date, obtained using the Get-Date cmdlet. This will be used as part of the backup name.

The command Get-AzureAutomationCredential is used to get the credentials for the Automation account provided, in this case it is one of the parameters provided for the runbook. The result is saved to a variable that will be used as a parameter for the backup command.

And finally, the backup command completes the backup of the analysis services cube to a file. Several parameters are used in this command: -BackupFile is the name given to the backup file, -Name is the name of the cube being backed up, -Server is the server name (passed in as a parameter), -Credential is the credentials obtained in the previous step, and -ApplyCompression enables the backup with compression.

I hope you have found this useful!

Tabular Cube Processing Report

I have created a Power BI report which provides detail on the state of processing in a Tabular Cube.
The report uses the cube’s dynamic management views to provide information about the cube’s partitions.

To use the tabular cube processing report, you need to insert the following information:

  • Server
  • Database (Cube Name)

Once entered and assuming the connection is fine you need to accept some native queries. These statements are select statements and will not alter the cube. That should give you a report similar to the one below, I have used an adventure works tabular model as an example.

Click here to download the report

This report is inspired by one I found for general tabular cube documentation by Data Savvy, which I highly recommend. That report includes information on: tables, columns, relationships, measure and security:

Click here to view tabular cube documentation blog post

My experience in obtaining an MCSE

Background

I have recently completed my MCSE in Data Management and Analytics, and I wanted to share my experience of working towards and passing the exams that have led to me getting this MCSE. In this post I will cover some of the challenges I faced and provide some tips and advice, hopefully others following the same certification path, or other paths, will find this useful.

I am a business intelligence developer at Purple Frog, I have several years of experience working with data, I started off as a data analyst and then went into reporting and more recently have been working on ETLs, data warehousing and cubes. I have been working with SQL throughout my various roles and therefore the best place for me to start my certification path was with the exam 70-761: Querying Data with Transact-SQL. The path I’ve taken up to this point is:

MCSA: SQL 2016 Database Development
– 70-761: Querying Data with Transact-SQL
– 70-762: Developing SQL Databases

MCSE: Data Management & Analytics
– 70-767: Implementing a SQL Data Warehouse (from the Elective Exam Pool)

The learning material

Although the first exam (70-761) fitted in quite well with my SQL background (I probably knew about 75% of the material beforehand), there was still some work required for me to get to the stage where I felt I was confident in passing the exam. For me, the best resource and my primary resource for learning the material has been the Exam Ref books, so for example “Exam Ref 70-761 Querying Data with Transact-SQL”. These books are structured in a way that the content is split into the sections covered by the exam, for example the books contain a chapter for every skill covered in the exam.

The one downside to the Exam Ref books is that at times it can feel quite wordy if you’re relying on the book alone, so what I found really useful was to supplement this with videos and demos on the topics where I needed a greater understanding. In addition to this, practice and doing exercises helped me to further understand the different concepts as I was able to try what I had learnt and see where I was going wrong.

The final resource that I found useful was Microsoft Docs (https://docs.microsoft.com/en-us/), this is a really good reference point for key facts, for example I found the page on CDC really useful for my latest exam (70-767).

The exam

There are the obvious tips such as sleep early the night before, get to the exam centre with time to spare and so on, but I wanted to share some of the exam techniques I found useful while doing these exams.

My top tip is check that you have covered off and are comfortable with all the skills measured in the exam, the skills measured for each exam can be found in the “Browse Certifications and Exams” section on the Microsoft Learning website (example shown below for exam 70-761). The skills are also stated in the Exam Ref books and as mentioned before the chapters in the book are based on the skills measured in the exam.

This was taken from https://docs.microsoft.com/en-us/learn/certifications/exams/70-761

What’s useful about the skills measured shown above is that it shows the weight of questions per skill in the exam. This is useful because you can work out if you need to focus on a weaker area if that area is a big part of the exam.

Time shouldn’t be an issue in the exam if you’ve prepared well, however some questions are not worded in the best way and can catch you out so do take the time to read each question properly, and do keep an eye on the time remaining after every 5-10 questions.

You have the option to flag questions and review them again later (note some questions cannot be flagged), make use of these flags for questions you are unsure of. This can be particularly useful if you’ve flagged a question and then a later question gives you a clue or reminds you of the answer for the question flagged earlier. Alternatively, you should be provided with a pen and wipeable board where you can make notes so note down the question number and topic so that you can come back to it later.

Next steps

I am currently studying towards the exam 70-768: Developing SQL Data Models, this will help develop my understanding and knowledge of data modelling and working with cubes and will also help me get the certification for MCSA: SQL 2016 BI Development. With these current certifications being retired in the near future the next plan is to work towards the certification Microsoft Certified: Azure Data Engineer Associate.

I hope you have found this blog useful and that you can use some of the tips mentioned in your study plans, all the best!

SSAS Tabular Calculation Groups – avoid using SELECTEDMEASURE or ISSELECTEDMEASURE

Introduction:

There is a very serious limitation in the behaviour of calculation groups when using the SELECTEDMEASURE or ISSELECTEDMEASURE functions, and we recommend not using them. Why? If a user creates their own custom calculations within their Power BI report (or composite model) then the value of SELECTEDMEASURE changes, breaking your calculation group logic.

Let me explain with an example:

In a tabular cube we may look to use calculation groups to apply additional filters when calculating measures. In our example we will be using a retail scenario, with budgets as well as ‘Like for Like’ (LFL) budgets. For those not in retail, LFL represents stores that were also open this time last year. We start with the measure “Budget” with the following definition:

We also have a hidden measure called “Budget LFL” with the following definition:

The measure “Budget LFL” is a variation of the “Budget” measure with additional filters applied at source. We also have a calculation group called “LFL Filter”, this is used to return the hidden “Budget LFL” measure based on the option selected in this filter and has the following definition:

This functionality is demonstrated below where the two visuals from Power BI show the same measure with and without the LFL Filter applied:

No LFL filter applied
LFL filter applied

Problem:

A problem arises when you try to create a custom measure (in the cube or within a Power BI report) and filter by the calculation group, for example we create a measure called “CustomBudget” which is a copy of the “Budget” measure and has the definition:

Adding this custom measure to the visual shown earlier we can see that the calculation group “LFL Filter” has no affect on the custom measure:

LFL filter applied

This is because the SELECTEDMEASURE() is now [CustomBudget] and not [Budget], therefore the logic in the calculation group doesn’t recognise the selected measure, and therefore doesn’t switch to the LFL measure.

Workaround:

To get around this we move the bulk of the logic from the calculation group to the measure itself. The measure now changes its behaviour by looking at the selected value of the LFL filter, instead of the calculation group managing it centrally:

This is the new definition for the “Budget” measure
This is the new definition for the “LFL Filter” calculation group

We refresh Power BI and find that the results are now as expected, the original and custom measures now match when using the calculation group as a filter:

This is the same visual but using the new logic, now with matching results

Thank you to Darren Gosbell (Twitter | Blog) for the suggestion of this workaround.

Cube deployment error: Cannot resolve all paths while de-serializing Database

I recently came across the following error while deploying a tabular cube using the Analysis Services Deployment Wizard:

My updates for this deployment included removing a table from the cube. As part of the cube deployment options I chose to “Retain roles and members”. However the issue here was that an existing role was referencing the table I had deleted, for example the below image shows the Role Manager in Visual Studio and shows that the Cube_Finance role references the “Transaction Type” table.

To resolve this error I simply had to deploy the cube using the option “Deploy roles and retain members”:

This resulted in a successful deployment with the role updated to exclude any references to the deleted table:

Note this error can also occur if a table has been renamed, for the same reason explained above where a role references the “old” table name. The solution is the same: deploy roles and retain members.

SSAS Tabular Deployment Wizard fails Newtonsoft.Json error

The Scenario

Deploying an Analysis Services Tabular model to SSAS Azure using the Analysis Services Deployment Wizard. Both Visual Studio 2017 & SQL Server 2017 installed on the client.

Try and click on the ellipses to change the data source connection string or impersonation information results in a Newtonsoft.json error:

“Could not load file or assembly ‘Newtonsoft.Json, Version 6.0.0.0, Culture=neutral, ……”

 

The Solution

If you look at the folder containing Newtonsoft.Json.dll (2C:\Program Files (x86)\Microsoft SQL Server\140\Tools\Binn\ManagementStudio\Extensions\Application” – substitute 140 for your version of SQL Server) and then right click properties on the Newtonsoft.Json.dll file you’ll probably find that it’s version 10.0.3 or later, but the error message shows that the deployment wizard is looking for version 6.0.x. We therefore have to tell the SSAS Deployment Wizard to use a newer version instead.

Use Notepad(++) to open the Microsoft.AnalysisServices.Deployment.exe.config file, which you should find in “C:\Program Files (x86)\Microsoft SQL Server\140\Tools\Binn\ManagementStudio” (substitute 140 as above, for your correct version).

Find the section for Newtonsoft.Json

And make a couple of changes (backup the file first!!!)

  • Add in a bindingRedirect line, redirecting 6.0.0.0 to 10.0.0.0
  • Change the codeBase version number to match the actual file version, 10.0.0.0

Save and close the file, relaunch the deployment wizard, and you should find it all works ok.

Frog-Blog Out

 

Showing December to December to see opening values for a year in SSAS/MDX

I came across an interesting MDX challenge this week; within a cube’s Date dimension, how to show December twice, once where it should be and again as the opening value for the following year. i.e. for each year I need to show Dec (prev yr), Jan, …, Nov, Dec.

Why? Well if you consider the following pivot chart, you can clearly see growth from Jan to Feb, Feb to Mar, etc., but it’s very difficult to see the growth between Dec and Jan.

So to make things easier to a user we want to repeat the Dec value and insert it before Jan as an opening value for the year. Now we can’t duplicate December’s data in the DW or Cube for obvious reasons, so we need to create a virtual copy of December using some MDX.

Step 1 – Create a new virtual month member called ‘Open’

1
CREATE MEMBER CURRENTCUBE.[Date].[Month Name].[Open] AS null;

Step 2 – Create a named set ‘Dec to Dec’ that includes Open as well as Jan-Dec

1
2
3
4
CREATE STATIC SET CURRENTCUBE.[Dec to Dec]
AS {[Date].[Month Name].[Open]
,[Date].[Month Name].[Month Name].[Jan]
:[Date].[Month Name].[Month Name].[Dec]};

Step 3 – Set the ‘Open’ member to show the value for ‘Dec’ in the previous year

1
2
3
SCOPE ([Date].[Month Name].[Open]);
THIS = ([Date].[Year].PREVMEMBER, [Date].[Month Name].[Dec]);
END SCOPE;

 

When you then select the ‘Dec to Dec’ named set in Excel we do indeed see 13 months per year, including our new ‘Open’ month. There are two problems however:

  1. Open appears at the end not the start. To fix this open up the ‘Field Settings’ of the ‘Dec to Dec’ field in the pivot table, select the ‘Layout and Print’ tab and untick ‘Automatically order and remove duplicates from the set’
  2. If you filter the pivot to only show a single year, Excel decides to hide the ‘Open’ member (it is Excel, as MDX works fine!). To fix this go to ‘PivotTable Options’ then on the ‘totals and Filters’ tab, untick ‘Include filtered items in set totals’.

Hey presto, your pivot table will now look something like this:

You can now clearly see the growth in Dec to Jan, and everyone’s happy.

<Frog-Blog Out>

MDX Compare or Rank Similar Members

Or should this be called: “Recreating DAX’s EARLIER function in MDX” – either way, a useful technique that solves a problem…

MDX makes it very easy for us to compare one member against others, using functions such as RANK() etc. But how do we dynamically compare a member against a subset of other members? I came across a customer requirement recently where we had to rank a member against other members that had similar properties, based on other measures. This will probably make more sense with an example…

Taking AdventureWorks, we can easily calculate the sales rank of each product using something like this:

WITH MEMBER [Measures].[Rank] AS
   RANK([Product].[Product].CURRENTMEMBER
       ,[Product].[Product].[Product].MEMBERS
       ,[Measures].[Internet Sales Amount])
SELECT {[Measures].[Internet Sales Amount], [Measures].[Rank]} ON 0
      ,NONEMPTY({[Product].[Product].[Product].MEMBERS}
                ,[Measures].[Internet Sales Amount]) ON 1
FROM [Adventure Works]
WHERE [Date].[Calendar Year].&[2008]

MDXRank1

This ranks each product against all other products. But is that a fair comparison? Comparing sales of a water bottle against a top of the range racing bike. Not really. So how about we instead rank each product against all products within +/-20% of the same cost. So if the water bottle costs £20, then we would rank its sales against all products with a cost between £16 and £24. This gives a more accurate idea of how well each product is performing compared to its peers.

Although to keep the MDX simple here, let’s just say any product within £20.

In AdventureWorks we have [Measures].[Internet Average Unit Price], which can be used to determine comparable products. So how do we go about achieving this?

If we look at the RANK function, it takes three parameters; the member being ranked, the set over which to rank, and the measure to be used.

All we have to do is filter the second parameter, the set over which to rank, to include similar members. So maybe something like this:

WITH MEMBER [Measures].[Rank] AS
   RANK([Product].[Product].CURRENTMEMBER
       ,FILTER([Product].[Product].[Product].MEMBERS
              ,ABS([Measures].[Internet Average Unit Price]
               -([Measures].[Internet Average Unit Price]
                ,[Product].[Product].CURRENTMEMBER))
               <=20
               )
       ,[Measures].[Internet Sales Amount])

If we break this down, we’re just changing the 2nd parameter to be a filtered set, where the unit price is within £20 of the unit price of the current member. This should work right?

Unfortunately, wrong. The results look exactly the same as the original rank – nothing has changed.

MDXRank2

The problem here is that CURRENTMEMBER is within the filter function, so it changes context to refer to whatever row is being considered at the time by the filter function. So [Measures].[Internet Average Unit Price] and ([Measures].[Internet Average Unit Price],[Product].[Product].CURRENTMEMBER) are always the same product, and no rows are filtered out. CURRENTMEMBER does NOT refer to the current member being considered by the RANK function, but by the FILTER function.

In DAX we have the EARLIER and EARLIEST functions, which would be great here, and would allow us to step out of the current context into the previous calculation layer. But unfortunately we haven’t been blessed with an MDX EARLIER function. So how do we fix this in MDX?

The trick here is dynamic sets, using the STRTOSET function. This allows us to grab the member being ranked, and treat it as a fixed member within the FILTER function.

WITH MEMBER [Measures].[Rank] AS
   RANK([Product].[Product].CURRENTMEMBER
       ,STRTOSET('
          FILTER([Product].[Product].[Product].MEMBERS
                ,ABS([Measures].[Internet Average Unit Price]
                 -' + CSTR([Measures].[Internet Average Unit Price]) + ')
                 <=20
                )'
       )
       ,[Measures].[Internet Sales Amount])

We building up a string, which will fix the value of [Measures].[Internet Average Unit Price] to that of the product being ranked, and will then dynamically compare it to the value of [Measures].[Internet Average Unit Price] for all other products. Those within £20 will be included in the resulting set, and will be used to rank the original product.

MDXRankYou can see the result in the screenshot above, where the ranking is dependent on the average unit price.

Frog-Blog Out

Excel creates inefficient MDX

Whilst investigating a slow set of Excel pivot tables (connected to a multidimensional SSAS cube), I found some odd behaviour in how Excel generates its MDX, sometimes doing far more work than if required and causing significant performance reduction.

Take the following example, just querying Customer Count by Promotion, against the AdventureWorks cube.

ExcelMDX01

The profile trace (just using “Query Begin”, “Query End” and “Query Subcube Verbose”) shows that the query runs two “Query Subcube Verbose” calls, which means two separate calls to the SSAS storage engine.

The first Query Subcube event requests “0” for Promotion, which is the [All] member, and so is used as the total in the pivot table. The second queries “*”, which returns every member in the Promotion attribute, which in this example returns values for each of the 4 different promotions.

This makes sense, as for a distinct count measure, you can’t simply add up the components, you do need to calculate the subtotal or grand total separately. The query just takes twice as long as a result.

So lets say that the total is irrelevant to the user, so they disable totals within Excel…

Then we refresh the pivot, we should only see a single storage engine call. Right? Wrong.

ExcelMDX04There are still two calls to the storage engine, and SSAS is still calculating the grand total for the query, which Excel is then just discarding.

In a small cube this may make very little noticeable difference to the pivot performance. But on a multi-terrabyte cube with billions of records, distinct count operations can take minutes to run. And calculating the non-required totals can make a significant difference.

This becomes even worse when multiple attributes are stacked on the rows and/or columns. Each new attribute requires its own totals calculating, resulting in more and more calls to the storage engine. In the following pivot I’ve added an extra two attributes to the rows, with no subtotals or grand totals.

ExcelMDX05The following trace shows that 8 Subcube queries were performed.

ExcelMDX06These correspond to detail and totals for each combination of attributes, which calculates as 2^n storage queries, where n is the number of attributes selected on the rows/columns of a pivot.

Now it’s common practice for users to want to create pivots with 4, 5, 6+ attributes on rows/columns, and the workload on the cube increases exponentially with each one they add. This is not good!

There are two workarounds to this:

1) Forcibly disable totals for relevant dimensions in the cube.

2) Use Excel 2013 (or above), and hard code the set of rows to exclude totals.

 

Let’s look at these in more detail…

1) Forcibly disable totals for relevant dimensions in the cube.

Let’s say that for a particular cube, customer counts were only ever calculated on the monthly level, and it didn’t make sense to aggregate customer counts up to a total across all months/years. We could just add some MDX into the Calculation script to disable date totals for the relevant measure(s).

SCOPE([Measures].[Customer Count], [Date].[Calendar].[All]);
      THIS=null;
END SCOPE;

Excel will still produce an MDX query asking for the date totals, but SSAS will just ignore it and return null. Note that this obviously only works where there is no business sense in asking for the total.

 

2) Use Excel 2013 (or above), and hard code the set of rows to exclude totals.

Excel 2013 introduced an ability to customise the MDX of a pivot. On the Analyze ribbon, click ‘Fields, Items & Sets’, and then ‘Create Set Based on Row Items…’ExcelMDX07You can then remove any totals or rows that you don’t want, including the subtotals and grand total.

ExcelMDX08Or if you’re an MDX guru, you can click on ‘Edit MDX’ and write your own sets for even more performance, and also to make it more dynamic.

When we run this, we get the same output in Excel, but the profile trace now shows:

ExcelMDX09Just a single call to the storage engine, instead of 16. 1/16 of the work for the same results.

Now beware that once you do this, you’re essentially hard coding that part of the pivot, so in this example any new promotions would not automatically show in the pivot unless they were added. That;s why learning some MDX and writing a more dynamic set is preferable.

But the obvious answer to this is, please, please Microsoft, fix the terrible MDX that Excel creates! If you agree, please upvote the Connect item requesting that it be improved.

Frog-Blog Out

 

1 2 3 5
Power BI Sentinel
The Frog Blog

Team Purple Frog specialise in designing and implementing Microsoft Data Analytics solutions, including Data Warehouses, Cubes, SQL Server, SSIS, ADF, SSAS, Power BI, MDX, DAX, Machine Learning and more.

This is a collection of thoughts, ramblings and ideas that we think would be useful to share.

Authors:

Alex Whittles
(MVP)
Reiss McSporran
Jeet Kainth
Jon Fletcher
Nick Edwards
Joe Billingham
Microsoft Gold Partner

Data Platform MVP

Power BI Sentinel
Frog Blog Out
twitter
rssicon