0845 643 64 63


Showing December to December to see opening values for a year in SSAS/MDX

I came across an interesting MDX challenge this week; within a cube’s Date dimension, how to show December twice, once where it should be and again as the opening value for the following year. i.e. for each year I need to show Dec (prev yr), Jan, …, Nov, Dec.

Why? Well if you consider the following pivot chart, you can clearly see growth from Jan to Feb, Feb to Mar, etc., but it’s very difficult to see the growth between Dec and Jan.

So to make things easier to a user we want to repeat the Dec value and insert it before Jan as an opening value for the year. Now we can’t duplicate December’s data in the DW or Cube for obvious reasons, so we need to create a virtual copy of December using some MDX.

Step 1 – Create a new virtual month member called ‘Open’

CREATE MEMBER CURRENTCUBE.[Date].[Month Name].[Open] AS null;

Step 2 – Create a named set ‘Dec to Dec’ that includes Open as well as Jan-Dec

AS {[Date].[Month Name].[Open]
,[Date].[Month Name].[Month Name].[Jan]
:[Date].[Month Name].[Month Name].[Dec]};

Step 3 – Set the ‘Open’ member to show the value for ‘Dec’ in the previous year

SCOPE ([Date].[Month Name].[Open]);
THIS = ([Date].[Year].PREVMEMBER, [Date].[Month Name].[Dec]);


When you then select the ‘Dec to Dec’ named set in Excel we do indeed see 13 months per year, including our new ‘Open’ month. There are two problems however:

  1. Open appears at the end not the start. To fix this open up the ‘Field Settings’ of the ‘Dec to Dec’ field in the pivot table, select the ‘Layout and Print’ tab and untick ‘Automatically order and remove duplicates from the set’
  2. If you filter the pivot to only show a single year, Excel decides to hide the ‘Open’ member (it is Excel, as MDX works fine!). To fix this go to ‘PivotTable Options’ then on the ‘totals and Filters’ tab, untick ‘Include filtered items in set totals’.

Hey presto, your pivot table will now look something like this:

You can now clearly see the growth in Dec to Jan, and everyone’s happy.

<Frog-Blog Out>

Excel creates inefficient MDX

Whilst investigating a slow set of Excel pivot tables (connected to a multidimensional SSAS cube), I found some odd behaviour in how Excel generates its MDX, sometimes doing far more work than if required and causing significant performance reduction.

Take the following example, just querying Customer Count by Promotion, against the AdventureWorks cube.


The profile trace (just using “Query Begin”, “Query End” and “Query Subcube Verbose”) shows that the query runs two “Query Subcube Verbose” calls, which means two separate calls to the SSAS storage engine.

The first Query Subcube event requests “0” for Promotion, which is the [All] member, and so is used as the total in the pivot table. The second queries “*”, which returns every member in the Promotion attribute, which in this example returns values for each of the 4 different promotions.

This makes sense, as for a distinct count measure, you can’t simply add up the components, you do need to calculate the subtotal or grand total separately. The query just takes twice as long as a result.

So lets say that the total is irrelevant to the user, so they disable totals within Excel…

Then we refresh the pivot, we should only see a single storage engine call. Right? Wrong.

ExcelMDX04There are still two calls to the storage engine, and SSAS is still calculating the grand total for the query, which Excel is then just discarding.

In a small cube this may make very little noticeable difference to the pivot performance. But on a multi-terrabyte cube with billions of records, distinct count operations can take minutes to run. And calculating the non-required totals can make a significant difference.

This becomes even worse when multiple attributes are stacked on the rows and/or columns. Each new attribute requires its own totals calculating, resulting in more and more calls to the storage engine. In the following pivot I’ve added an extra two attributes to the rows, with no subtotals or grand totals.

ExcelMDX05The following trace shows that 8 Subcube queries were performed.

ExcelMDX06These correspond to detail and totals for each combination of attributes, which calculates as 2^n storage queries, where n is the number of attributes selected on the rows/columns of a pivot.

Now it’s common practice for users to want to create pivots with 4, 5, 6+ attributes on rows/columns, and the workload on the cube increases exponentially with each one they add. This is not good!

There are two workarounds to this:

1) Forcibly disable totals for relevant dimensions in the cube.

2) Use Excel 2013 (or above), and hard code the set of rows to exclude totals.


Let’s look at these in more detail…

1) Forcibly disable totals for relevant dimensions in the cube.

Let’s say that for a particular cube, customer counts were only ever calculated on the monthly level, and it didn’t make sense to aggregate customer counts up to a total across all months/years. We could just add some MDX into the Calculation script to disable date totals for the relevant measure(s).

SCOPE([Measures].[Customer Count], [Date].[Calendar].[All]);

Excel will still produce an MDX query asking for the date totals, but SSAS will just ignore it and return null. Note that this obviously only works where there is no business sense in asking for the total.


2) Use Excel 2013 (or above), and hard code the set of rows to exclude totals.

Excel 2013 introduced an ability to customise the MDX of a pivot. On the Analyze ribbon, click ‘Fields, Items & Sets’, and then ‘Create Set Based on Row Items…’ExcelMDX07You can then remove any totals or rows that you don’t want, including the subtotals and grand total.

ExcelMDX08Or if you’re an MDX guru, you can click on ‘Edit MDX’ and write your own sets for even more performance, and also to make it more dynamic.

When we run this, we get the same output in Excel, but the profile trace now shows:

ExcelMDX09Just a single call to the storage engine, instead of 16. 1/16 of the work for the same results.

Now beware that once you do this, you’re essentially hard coding that part of the pivot, so in this example any new promotions would not automatically show in the pivot unless they were added. That;s why learning some MDX and writing a more dynamic set is preferable.

But the obvious answer to this is, please, please Microsoft, fix the terrible MDX that Excel creates! If you agree, please upvote the Connect item requesting that it be improved.

Frog-Blog Out


SSAS Tabular String Imported as Integer

Let me start by saying that I think the SSAS Tabular model is great. But…. there are a number of problems that Microsoft still need to get ironed out.

Not least of which is being able to import data properly directly from CSV/Text files. Yes you can import directly from csv, but you are given absolutely no control over the process, and this can lead to some serious problems.

One of these issues is the rather odd automated data type selection that is used by the Tabular import process. A column consisting of a combination of alpha and numerical text is more often than not treated as an integer, with all text information stripped out. And the Tabular model designer provides absolutely no way of changing this behaviour.

For example, take the following csv file, containing just three columns; ID, ItemCode and ItemValue.


The second column should clearly be interpreted as text, as the 5th row contains a non-numerical value. However when this is imported into the Tabular model, it treats the column as an integer. As you can see from the screenshot below, because the 5th row doesn’t contain a valid integer, the value is just ignored.


One would expect that we could simply go into the table in the designer, and update the Data Type property for the column. No. This just takes the numerical value and formats it as text. But any non-numerical values are still stripped out. The problem is that when the Tabular model reloads the file, it detects that the data type is an integer, and there’s nothing we can do to override it.

The only way of getting around this is by wrapping the strings in quotes within our csv.


This is irritating but we really don’t have a choice, just remember to be explicit in any csv definition that is to be imported directly into a Tabular model.

However, what if we find this out too late? If we’ve already built the model, added all of our DAX calculations in, set up the relationships etc., how do we change the format of an already created column?

Firstly we have to get the csv updated to wrap every string column in quotes.

If we try and just reimport this we’ll get an error “Unable to convert a value to the data type requested for table ‘xxx’ column ‘xxx’. The current operation was cancelled because another operation in the transaction failed.”


So first you have to change the column data type for the table in the designer. Click on the column, then in the column properties, change Data Type to Text.

ChangeTabularDataType05Once this is done, you can reprocess the table and import the actual text.

This is all well and good, and works most of the time. However, I recently encountered a Tabular model which had this problem, and the above process wouldn’t work. So the only solution I found was to go routing in the Tabular model’s xml source code, and force it to change. Hacking it manually worked a treat, so I thought I’d share the process here. Just be careful – always keep a backup of your files before you change anything!

To do this, open up your .bim file in a suitable text editor. I highly recommend Notepad++, as it works great for XML.

We need to change three things:

1) Change the data type for the table column key & name, within the dimensions

change <DataType>BigInt</DataType> to <DataType>WChar</DataType>     (Note this may be Int or BigInt)
change <DataSize>-1</DataSize> to <DataSize>32768</DataSize>

Then do the same for the <NameColumn>


2) Change the data type for the table column key & name in the corresponding cube

change <DataType>BigInt</DataType> to <DataType>WChar</DataType>     (Note this may be Int or BigInt)
change <DataSize>-1</DataSize> to <DataSize>32768</DataSize>

Then do the same for the <NameColumn>

3) Change the definition of the csv datasource

delete ‘type=”xs:int”‘ from the element, and replace it with a SimpleType and restriction defining the string:

<xs:simpleType><xs:restriction base=”xs:string”><xs:maxLength value=”32768″ /></xs:restriction></xs:simpleType>ChangeTabularDataType07


Then save the .bim file, reload your Tabular model, and reprocess the table. Problem solved.




SSAS Tabular performance – NUMA update

How does the SSAS 2012 Tabular model performance change when you add more CPU sockets / NUMA nodes?

In my last post (SSAS Tabular NUMA and CPU Cores Performance) I presented the results of some testing I’d been doing on the scalability of the SSAS 2012 Tabular model. Specifically with the performance of distinct count measures over large data volumes (50-200m distinct customers).

The conclusion was that moving from 1 NUMA node (CPU socket) to 2 had no impact on query performance, so the 2nd CPU is entirely wasted. This actually contradicted other advice and recommendations that indicated that adding a second node would actually make the performance worse.

After discussing the issue with a member of the SSAS development team, they advised that the method I was using to disable cores was flawed, and that we shouldn’t be using Windows System Resource Manager. So I re-ran the tests disabling cores (and associated memory) using MSConfig, simulating a physical core removal from the server.

The test results were immediately different…

TabularNUMACoresTestThe hardware setup was the same as before, but with a larger data set:

  • 30Gb SSAS tabular cube, running on a 2 x CPU 32 core (Xeon E5-2650 2Ghz, 2 x NUMA nodes, hyperthreaded) server with 144Gb RAM
  • SQL Server 2012 SP1 CU8 Enterprise (+ a further hotfix that resolves a problem with distinct counts >2m)
  • 900m rows of data in primary fact
  • 200m distinct CustomerKey values in primary fact
  • No cube partitioning
  • DefaultSegmentRowCount: 2097152
  • ProcessingTimeboxSecPerMRow: 0
  • CPU cores and associated memory disabled using MSConfig

The two test queries were

  • Query 1: Simple, single value result of the total distinct customer count
  • Query 2: More complex distinct count query, sliced by a number of different attributes to give approx 600 result cells

As soon as the cores are increased above 16 (i.e. the 2nd CPU is introduced), the queries take 1.45x and 2x the time to run. Query performance drops significantly. The simple query takes almost exactly double the time.

These results now support other theories floating around the blogosphere, that adding extra CPUs not only doesn’t help the tabular performance, it actually significantly hinders it.

As before, the default segment count setting gave the best performance at 2m and 4m. Raising it seemed to degrade performance.

Frog Blog out

Renaming an SSAS Tabular Model

I came across a frustrating problem today. I’d just finished processing a large tabular cube (SQL Server 2012), which had taken 11 hours in total.

On trying to connect to the cube to test it, I’d made a schoolboy error; The database was named correctly, but the model inside it was named MyCubeName_Test instead of MyCubeName. No problem, I’ll just right click the cube in SSMS and rename it. Well, no, there is no option to rename a model, just the database. I didn’t fancy doing a full reprocess, but luckily a little digging in the xml files presented a solution.

  1. Detach the cube
  2. Open up the cube’s data folder in explorer (x:\xx\OLAP\data\MyCubeName.0.db, or whatever it happens to be in your case)
  3. Find the Model.xx.cub.xml file, and open it in Notepad++ (other text editors are available…)
  4. Search for the <Name> tag, and just change the name inside it
  5. Save the file and close it
  6. Re-attach the cube



SSAS Tabular – NUMA and CPU Cores Performance

[UPDATE] After further investigation, I found that the tests in this post were inacurate and the results unreliable. Updated NUMA test results here

In my last post (SSAS Tabular Performance – DefaultSegmentRowCount) I presented some analysis of the query performance impact of changing the DefaultSegmentRowCount setting. This post describes the next tests that I ran on the same system, investigating the impact of restricting SSAS to just 1 NUMA node instead of the 2 avaiable on the server.

It’s well known that SSAS Tabular is not NUMA aware, so it’s common to see advice recommending affiliating SSAS to a single NUMA node to improve performance.

From what I’d read, I was expecting that by affiliating SSAS to a single NUMA node that the query performance would improve slightly, maybe 10-30%.

Recap of the setup:

  • 7.6Gb SSAS tabular cube, running on a 2 x CPU 32 core (Xeon E5-2650 2Ghz, 2 x NUMA nodes) server with 144Gb RAM
  • SQL Server 2012 SP1 CU7 Enterprise
  • 167m rows of data in primary fact
  • 80m distinct CustomerKey values in primary fact
  • No cube partitioning
  • DefaultSegmentRowCount: 2097152
  • ProcessingTimeboxSecPerMRow: 0
  • CPU core affinity configured using Windows System Resource Manager (see John Sirman’s great guide to using WSRM with SSAS)

I ran profiler, checking the ‘Query End’ duration on a simple distinct count of CustomerKey, with no other filters or attributes involved.


You can see that dropping from 32 cores across 2 NUMA nodes down to 16 cores on a single node had almost no impact at all.

Within a single NUMA node, the performance dramatically improved as the number of cores increased, but as soon as a second NUMA node is added, the performance flat lines, with no further significant improvement no matter how many cores are added.

As per my last post – I’m sure there are other things afoot with this server, so this behaviour may not be representative of other setups, however it again reinforces advice you will have already seen elsewhere, that with SSAS Tabular – avoid NUMA hardware…

Frog-Blog out

SSAS Tabular performance – DefaultSegmentRowCount

I’m currently investigating a poorly performing Tabular model, and came across some interesting test results which seem to contradict the advice in Microsoft’s Performance Tuning of Tabular Models white paper.

Some background:

  • 7.6Gb SSAS tabular cube, running on a 2 x CPU 32 core (Xeon E5-2650 2Ghz, 2 x NUMA nodes) server with 144Gb RAM
  • SQL Server 2012 SP1 CU7 Enterprise
  • 167m rows of data in primary fact
  • 80m distinct CustomerKey values in primary fact
  • No cube partitioning

A simple distinct count in DAX of the CustomerKey, with no filtering, is taking 42 seconds on a cold cache. Far too slow for a tabular model. Hence the investigation.

p88 of the Performance Tuning of Tabular Models white paper discusses the DefaultSegmentRowCount, explaining that it defaults to 8m, and that there should be a correlation between the number of cores and the number of segments. [The number of segments calculated as the number of rows divided by the segment size].

It also indicates that a higher segment size may increase compression, and consequently query performance.

Calculating the number of segments for our data set, gives us the following options:

Rows 167,000,000
Segment Size # Segments
1048576 169
2097152 80
4194304 40
[default] 8388608 20
16777216 10
33554432 5
67108864 3

So, with 32 cores to play with, we should be looking at the default segment size (8m) or maybe reduce it to 4m to get 40 segments. But the extra compression with 16m segment size may be of benefit. So I ran some timing tests on the distinct count measure, and the results are quite interesting.


It clearly shows that in this environment, reducing the DefaultSegmentRowSize property down to 2m improved the query performance (on a cold cache) from 42s down to 27s – 36% improvement. As well as this, processing time was reduced, as was compression.

This setting creates 80 segments, 2.5 times the number of cores available, but achieved the best performance. Note that the server’s ProcessingTimeboxSecPerMRow setting has been set to 0 to allow for maximum compression.

There’s more to this systems’s performance problems than just this, NUMA for a start, but thought I’d throw this out there in case anyone else is blindly following the performance tuning white paper without doing your own experimentation.

Each environment, data set and server spec is different, so if you need to eek out the last ounce of performance, run your own tests on the SSAS settings and see for yourself.

Frog-Blog Out

[Update: Follow up post exploring the performance impact of NUMA on this server]

Dimension ProcessAdd in SSAS

When you have very large dimensions in SQL Server Analysis Services (SSAS) cubes, the time taken to process them can cause a problem. This post builds upon an excellent walkthrough of the ProcessAdd option written by Daniel Calbimonte on MSSQLTips.com. and shows how to automate the identification of new data to be added.

What are the main options for processing a dimension?  (Technet details here)

  • Process Full – the whole dimension is dropped and rebuilt
  • Process Update – a new copy of the dimension is processed in full and the results compared, with changes being copied to the original dimension
  • Process Add – just add new rows to the dimension, don’t do anything with existig rows

There are lots of posts around the internet that discuss the pros and cons of each of these, so I won’t go into the details. But as the title suggests, we’ll focus on the ProcessAdd option.

How do you implement ProcessAdd?

Daniel provides a fantastic explanation and code walkthrough of how to use SSIS (Integration Services) to implement a ProcessAdd solution, please read through his post before continuing. He talks us through using an SSIS data flow to create a source query to retrieve the new rows, and then the Dimension Processing component as a destination to receive the new rows.


And this works really well. The problem then becomes how to identify which rows need to be added. This is a problem that Daniel overcomes by having a seperate table of records to be added. The source query just selects the entire table. This table could be populated by the ETL logic, a trigger, or any number of other processes. But can we improve upon this and make the proccess more seamless and transparent?

Getting all MDX

For me, the best way of enhancing this is to build a solution which self maintains. So, we need to look inside the cube dimension, find the last record that has already been added, and then use that to find any new records from the warehouse.

Step 1 – Find the maximum key for a dimension

 WITH MEMBER [Measures].[MaxKey] AS
   MAX([SOURCE Currency].[SOURCE Currency Code].ALLMEMBERS
   , StrToValue([SOURCE Currency].[SOURCE Currency Code].currentmember.MEMBER_KEY))
   {[Measures].[MaxKey]} ON 0
   [Adventure Works]

The MDX above creates a calculated member called MaxKey, which finds the maximum Key (the surrogate key identifier for the dimension) across all members of the dimension. Note that we have to use StrToValue() to convert the key from a string to an integer so that it sorts numerically not alphabetically. If you run this in SSMS against the AdventureWorks2012DW database the result should be 105. You can verify this by querying the DimCurrency table in the warehouse and you’ll find that 105 is the largest CurrencyKey value.


Step 2 – How do we use MDX within SSIS?

In the Control Flow of the SSIS package, add an ‘Execute SQL’ task. Connect it to a new OLE DB connection to Analysis Services, as per the following screenshot.



Set the SQL Statement of the Data Source to the MDX query, and set the ResultSet property to ‘Single Row’.


On the Result Set tab, set the [Measures].[MaxKey] result to be saved to a new variable; ‘User::MaxKey’ (An Int32).


The Execute SQL task should precede the existing Data Flow Task.


Step 3 – Dynamically create the SQL Query from the MDX results

Now we have the maximum Key value stored in a variable, we can create another variable, called ‘SQL’ (a string), with an expression which includes the Key value.


The expression should be

 "SELECT  CurrencyKey, CurrencyAlternateKey, CurrencyName
  FROM    DimCurrency
  WHERE   CurrencyKey&gt;" +  (DT_WSTR,20)@[USER::MaxKey]

This builds a SQL statement dynamically, based upon the maximum key identified from the MDX statement.

We can then set the source query in the Data Flow to use the resulting SQL Query, by setting its ‘Data Access Mode’ to ‘SQL Command From Variable’, and the ‘Variable Name’ to ‘User::SQL’.


And that’s it. This process will automatically find any new dimension members in the warehouse table, and add them into the cube dimension.

This process does rely on a number of things – so watch out…!

  • The dimension must only contain inserts, not updates or deletes (you’ll need to perform a ProcessUpdate for those)
  • The Key must always be incrementing – new records with a smaller key will not get picked up!
  • You’ll need to ignore duplicate key errors during processing. Plenty of discussion around why in other blog posts…
  • SSIS is not great at handling MDX connections – unless you want to go down the linked server route, only ever have a single MDX query per SSIS package.

And finally, another shout out to Daniel Calbimonte and MSSQLTips for providing the basis of this post.



Find first order date in MDX

I had an interesting question the other day; “how do I find the date of the first order/activity for a given customer/employee/product etc in MDX”?

Well you’ll be pleased to know that it’s pretty simple. The sample code bellow will work with the AdventureWorks DW 2008R2 cube.

First of all we need to filter the date dimension to contain only those with orders. Then take the first item of the resulting set, then find its name. Simples.

Filter the date hierarchy to only those dates with a Reseller Order Count:

   FILTER([Date].[Date].[Date].MEMBERS, [Measures].[Reseller Order count])

Note that you can specify a more advanced filter, such as [Measures].[Reseller Order Count]>10, which would find the first date which had more than 10 orders.

Then find the first item:


Then find the its name:


Put this together in a query and you get:

   WITH MEMBER [Measures].[First Activity] AS
            , [Measures].[Reseller Order count]).ITEM(0).NAME
   SELECT {[Measures].[Reseller Order count]
         , [Measures].[First Activity]
       } ON 0,
      [Employee].[Employees].MEMBERS ON 1
   FROM [Adventure Works]

This returns the total reseller order count for each employee, along with the date of their first reseller order. Note that the Employee hierarchy here is a parent child hierarchy. The calculated member [First Activity] will aggregate and calculate correctly with any dimension or member specified on the 1 (ROWS) axis, be it a parent-child, single attribute, normal hierarchy, etc. and will always find the first order date for the current member.
You should get the following results:

Frog-Blog Out.

Video: Automating SSAS OLAP Cube documentation

Automating OLAP cube documentation – SQLBits presentation

For anyone that missed my presentation at SQLBits 8 in April, the video is now available here.

In this 1 hour session I present a method of automating the creation of documentation for SSAS OLAP cubes by using DMVs (dynamic management views) and spatial data, querying the metadata of the cube in realtime.

The results include the BUS matrix, star schemas, attribute lists, hierarchies etc. and are all presented in SSRS.

The blog posts to go with this are here:

You can view the slide deck here

The Frog Blog

I'm Alex Whittles.

I specialise in designing and implementing SQL Server business intelligence solutions, and this is my blog! Just a collection of thoughts, techniques and ramblings on SQL Server, Cubes, Data Warehouses, MDX, DAX and whatever else comes to mind.

Data Platform MVP

Frog Blog Out