Analysis Services

molapviolinsummarychartCan SSAS Multidimensional be faster than SSAS Tabular for distinct counts on large datasets?

We’ve all seen how fast the SSAS Tabular model can work – it’s an in-memory, heavily compressed analytics database. However you may have seen my previous posts on Tabular and NUMA, which show that at the moment Tabular is limited to a single CPU. This obviously limits its scalability and maximum performance.

The situation

A large project I’m working on, for a multinational telecoms provider, has a problem with the query performance with their current Tabular model. It’s fine for smaller datasets but does not scale well, specifically with distinct count operations. The distinct count is the single biggest critical factor for performance in this project. With in the region of 50m distinct customers, this is a sizeable problem. After working with Microsoft to investigate the Tabular performance we identified a bug in SSAS, which has been resolved in SQL 2012 SP1 CU9, but still doesn’t work fast enough on the limited hardware.

The Multi-dimensional model is NUMA aware (Although how well it is supported depends on your workload), so can make better use of hardware, but we all know how poorly MOLAP scales with distinct counts, due to the storage engine IO bottleneck. So what is the answer?

What about Solid State?

The reason the multidimensional model often experiences performance degredation for distinct count operations is the heavy disk IO required. Aggregations can’t be used effectively, so the SSAS storage engine has to work through the entire dataset on disk. With the speed of normal disks, especially when on a SAN, this is not great.

There are now a number of solid state storage options available that provide enterprise grade storage with some very impressive performance.

So the question is, can a multidimensional cube running on solid state storage outperform an equivalent in-memory Tabular cube?

The tests

I ran various query performance tests with (MOLAP) cubes from 0.5Tb to 2Tb. The results in this post are focusing entirely on the 0.5Tb tests, with an identical dataset and equivalent test queries on a Tabular cube.

A big thank you to to the following:

  • Millicom – for the test dataset, and for allowing me to publish the tests results here
  • Violin – for the use of a storage array for the test
  • Microsoft – for the loan of a server in the MTC (Microsoft Technology Center) in Reading.

The equipment for the multidimensional cube:violinarray

  • HPDL580
  • 4 x Intel Xeon E7-4850 2Ghz
  • 40 cores, hyperthreaded to 80
  • 256Gb RAM, 1067Mhz
  • Violin 6616 SLC, 1m IOPS, 4Gbps
  • SQL Server 2012 SP1 CU9
  • Distinct count measure groups matrix partitioned by month, and 60 non-overlapping buckets of CustomerKey
  • Basic usage based optimisation applied to aggregations. There is a high liklihood this could be improved further

sqlserverThe equipment for the Tabular cube

Note that the different number of CPUs is fair, Tabular is not able to benefit from multiple CPUs, Multidimensional is. The factors that are unequal are the threads per CPU (16 Vs 20) and memory speed (666Mhz Vs 1067Mhz), please bear this in mind when looking at the performance comparison results.

The dataset is 6 monthly snapshots of 60m customers in two primary fact tables with 300m and 1.4bn rows in each.

The 7 query tests cover a range of workloads from very simple to very complex. Queries 1&2 do not include distinct count measures, all other queries include a distinct customer count from one of the primary fact tables, with varying levels of complexity, granularity and result set size. Tests 1-4 are manually created MDX queries, 5-7 are real-world Excel workbooks provided by users, each containing 1-6 large pivot tables.

Each test was run on a cold SSAS cache, and with the windows file cache also cleared before every test. Every test was run 3 times, with the average taken.

Test scenarios:

  • Tabular, using the fastest performance measured from all previous configuration tests
  • Multidimensional, using all CPUs and local Raid 5 disk storage
  • Multidimensional, using a single CPU (single NUMA node) and Violin storage
  • Multidimensional, using all CPUs and Violin storage

The Results

molapviolinresulttable

And when plotted on a chart…

molapviolinresultchart

The first 3 tests are so quick that they don’t show up on the above chart, but you can see these results in the table above.

Conclusion

It’s clear from these tests that solid state storage can breathe new life into an IO constrained multidimensional cube. This really applies to any case where you’re making heavy use of distinct counts over a large, high cardinality dataset. In this case the average performance gain over the same disk based cube was 2.6x.

Multidimensional on solid state storage can be on average 1.9x faster than the Tabular model, when working with large high cardinality distinct count operations.

Although other research and evidence suggests that Multidimensional cubes don’t perform well with multiple NUMA nodes, these tests show that if your workload is very heavily constrained by the storage engine, you can gain significant performance from multiple CPUs/NUMA nodes.

And Finally…

As well as basic timing tests using Profiler, I also monitored the CPU and disk throughput in Perfmon. The following two screenshots show the difference between the multidimensional cube on disk and then the same cube on Violin. This is for test 4, which showed the most significant performance improvement from disk to solid state.

perfmondisk

perfmonviolin

These traces show test 4, which is has the following query structure:

SELECT {   [Measures].[Fully Additive Measure 1]
         , [Measures].[Fully Additive Measure 2]
         , [Measures].[Fully Additive Measure 3]
         , [Measures].[Distinct Measure 4]} ON COLUMNS ,
NON EMPTY {[Dimension 1].[Attribute 1].[Attribute 1].MEMBERS  --6 members
         * [Dimension 2].[Attribute 2].[Attribute 2].MEMBERS  --12 members
         * [Dimention 3].[Attribute 3].[Attribute 3].MEMBERS  --5 members
         * DESCENDANTS ([Date].[Calendar].[Year].&[2013], [Date].[Calendar].[Month])} ON ROWS  --6 members
FROM [Cube]

This returns 700 distinct tuples from the crossjoin, with an average of 400k distinct customers per tuple.

This shows the potential benefit of maximising the storage IO throughput. In the first image, the IO throughput being constrained to under 100Mbps means that only 3% of the CPU can be utilised. In the second trace, by providing up to 1.7Gb per second throughput, the CPU is allowed to work at an average of 60% for the duration of the query, with a clear impact on the query performance.

These charts also shows the reduced IO latency (IO response time). With the average IO response time in the first trace showing between 40-60ms for the disk IO, and the barely registering on the scale of the chart for the Violin array. I found the IO response for the Violin tests was averaging at 450μsec. The SLC unit runs at ~200μsec latency at the 4k chunk size, so the 8k chunks we’re asking for fully complete in the 450μsec that we saw.

There’s plenty more details and tests that I’m running, I may post further details if I get time. Until then…

Frog-Blog-Out

Update #1 06/06/2014:
In the interests of completeness and fairness, I subsequently re-ran the Tabular model performance tests on the same (Microsoft supplied) server as the MOLAP model. The only difference being that to maximise the performance I restricted the server to a single CPU/NUMA node.
Somewhat surprisingly, the performance difference between Tabular and MOLAP/Violin was even greater. Multidimensional performed 3.7x faster, compared with 1.9x faster on the original Tabular server.

Update #2 06/06/2014:
If you’re planning on implementing a large scale Tabular model, do take the above tests into consideration, however please be aware that there are reasons why the Tabular performance is poor here. The incredibly high cardinality of the distinct count field, and the data can’t be split up, remodelled or simplified. This will not apply to all data scenarios.
Please take a look at a recent white paper by Alberto Ferrari at SQLBI, in which he discusses a number of ways of optimising large distinct count data sets to suit the Tabular model. He’s achieved some very impressible results. Unfortunately these methods would not benefit the scenario described in this post, but it shows what can be done with Tabular if the data suits.
http://www.sqlbi.com/articles/using-tabular-models-in-a-large-scale-commercial-solution

© Alex Whittles, Purple Frog Systems Ltd

How does the SSAS 2012 Tabular model performance change when you add more CPU sockets / NUMA nodes?

In my last post (SSAS Tabular NUMA and CPU Cores Performance) I presented the results of some testing I’d been doing on the scalability of the SSAS 2012 Tabular model. Specifically with the performance of distinct count measures over large data volumes (50-200m distinct customers).

The conclusion was that moving from 1 NUMA node (CPU socket) to 2 had no impact on query performance, so the 2nd CPU is entirely wasted. This actually contradicted other advice and recommendations that indicated that adding a second node would actually make the performance worse.

After discussing the issue with a member of the SSAS development team, they advised that the method I was using to disable cores was flawed, and that we shouldn’t be using Windows System Resource Manager. So I re-ran the tests disabling cores (and associated memory) using MSConfig, simulating a physical core removal from the server.

The test results were immediately different…

TabularNUMACoresTestThe hardware setup was the same as before, but with a larger data set:

  • 30Gb SSAS tabular cube, running on a 2 x CPU 32 core (Xeon E5-2650 2Ghz, 2 x NUMA nodes, hyperthreaded) server with 144Gb RAM
  • SQL Server 2012 SP1 CU8 Enterprise (+ a further hotfix that resolves a problem with distinct counts >2m)
  • 900m rows of data in primary fact
  • 200m distinct CustomerKey values in primary fact
  • No cube partitioning
  • DefaultSegmentRowCount: 2097152
  • ProcessingTimeboxSecPerMRow: 0
  • CPU cores and associated memory disabled using MSConfig

The two test queries were

  • Query 1: Simple, single value result of the total distinct customer count
  • Query 2: More complex distinct count query, sliced by a number of different attributes to give approx 600 result cells

As soon as the cores are increased above 16 (i.e. the 2nd CPU is introduced), the queries take 1.45x and 2x the time to run. Query performance drops significantly. The simple query takes almost exactly double the time.

These results now support other theories floating around the blogosphere, that adding extra CPUs not only doesn’t help the tabular performance, it actually significantly hinders it.

As before, the default segment count setting gave the best performance at 2m and 4m. Raising it seemed to degrade performance.

Frog Blog out

© Alex Whittles, Purple Frog Systems Ltd

I came across a frustrating problem today. I’d just finished processing a large tabular cube (SQL Server 2012), which had taken 11 hours in total.

On trying to connect to the cube to test it, I’d made a schoolboy error; The database was named correctly, but the model inside it was named MyCubeName_Test instead of MyCubeName. No problem, I’ll just right click the cube in SSMS and rename it. Well, no, there is no option to rename a model, just the database. I didn’t fancy doing a full reprocess, but luckily a little digging in the xml files presented a solution.

  1. Detach the cube
  2. Open up the cube’s data folder in explorer (x:\xx\OLAP\data\MyCubeName.0.db, or whatever it happens to be in your case)
  3. Find the Model.xx.cub.xml file, and open it in Notepad++ (other text editors are available…)
  4. Search for the <Name> tag, and just change the name inside it
  5. Save the file and close it
  6. Re-attach the cube

Simples

Frog-Blog-Out

© Alex Whittles, Purple Frog Systems Ltd

[UPDATE] After further investigation, I found that the tests in this post were inacurate and the results unreliable. Updated NUMA test results here

In my last post (SSAS Tabular Performance – DefaultSegmentRowCount) I presented some analysis of the query performance impact of changing the DefaultSegmentRowCount setting. This post describes the next tests that I ran on the same system, investigating the impact of restricting SSAS to just 1 NUMA node instead of the 2 avaiable on the server.

It’s well known that SSAS Tabular is not NUMA aware, so it’s common to see advice recommending affiliating SSAS to a single NUMA node to improve performance.

From what I’d read, I was expecting that by affiliating SSAS to a single NUMA node that the query performance would improve slightly, maybe 10-30%.

Recap of the setup:

  • 7.6Gb SSAS tabular cube, running on a 2 x CPU 32 core (Xeon E5-2650 2Ghz, 2 x NUMA nodes) server with 144Gb RAM
  • SQL Server 2012 SP1 CU7 Enterprise
  • 167m rows of data in primary fact
  • 80m distinct CustomerKey values in primary fact
  • No cube partitioning
  • DefaultSegmentRowCount: 2097152
  • ProcessingTimeboxSecPerMRow: 0
  • CPU core affinity configured using Windows System Resource Manager (see John Sirman’s great guide to using WSRM with SSAS)

I ran profiler, checking the ‘Query End’ duration on a simple distinct count of CustomerKey, with no other filters or attributes involved.

TabularQueryTimeByCores

You can see that dropping from 32 cores across 2 NUMA nodes down to 16 cores on a single node had almost no impact at all.

Within a single NUMA node, the performance dramatically improved as the number of cores increased, but as soon as a second NUMA node is added, the performance flat lines, with no further significant improvement no matter how many cores are added.

As per my last post – I’m sure there are other things afoot with this server, so this behaviour may not be representative of other setups, however it again reinforces advice you will have already seen elsewhere, that with SSAS Tabular – avoid NUMA hardware…

Frog-Blog out

© Alex Whittles, Purple Frog Systems Ltd

I’m currently investigating a poorly performing Tabular model, and came across some interesting test results which seem to contradict the advice in Microsoft’s Performance Tuning of Tabular Models white paper.

Some background:

  • 7.6Gb SSAS tabular cube, running on a 2 x CPU 32 core (Xeon E5-2650 2Ghz, 2 x NUMA nodes) server with 144Gb RAM
  • SQL Server 2012 SP1 CU7 Enterprise
  • 167m rows of data in primary fact
  • 80m distinct CustomerKey values in primary fact
  • No cube partitioning

A simple distinct count in DAX of the CustomerKey, with no filtering, is taking 42 seconds on a cold cache. Far too slow for a tabular model. Hence the investigation.

p88 of the Performance Tuning of Tabular Models white paper discusses the DefaultSegmentRowCount, explaining that it defaults to 8m, and that there should be a correlation between the number of cores and the number of segments. [The number of segments calculated as the number of rows divided by the segment size].

It also indicates that a higher segment size may increase compression, and consequently query performance.

Calculating the number of segments for our data set, gives us the following options:

Rows 167,000,000
Segment Size # Segments
1048576 169
2097152 80
4194304 40
[default] 8388608 20
16777216 10
33554432 5
67108864 3

So, with 32 cores to play with, we should be looking at the default segment size (8m) or maybe reduce it to 4m to get 40 segments. But the extra compression with 16m segment size may be of benefit. So I ran some timing tests on the distinct count measure, and the results are quite interesting.

DefaultSegmentRowSize

It clearly shows that in this environment, reducing the DefaultSegmentRowSize property down to 2m improved the query performance (on a cold cache) from 42s down to 27s – 36% improvement. As well as this, processing time was reduced, as was compression.

This setting creates 80 segments, 2.5 times the number of cores available, but achieved the best performance. Note that the server’s ProcessingTimeboxSecPerMRow setting has been set to 0 to allow for maximum compression.

There’s more to this systems’s performance problems than just this, NUMA for a start, but thought I’d throw this out there in case anyone else is blindly following the performance tuning white paper without doing your own experimentation.

Each environment, data set and server spec is different, so if you need to eek out the last ounce of performance, run your own tests on the SSAS settings and see for yourself.

Frog-Blog Out

[Update: Follow up post exploring the performance impact of NUMA on this server]

© Alex Whittles, Purple Frog Systems Ltd

When you have very large dimensions in SQL Server Analysis Services (SSAS) cubes, the time taken to process them can cause a problem. This post builds upon an excellent walkthrough of the ProcessAdd option written by Daniel Calbimonte on MSSQLTips.com. and shows how to automate the identification of new data to be added.

What are the main options for processing a dimension?  (Technet details here)

  • Process Full – the whole dimension is dropped and rebuilt
  • Process Update – a new copy of the dimension is processed in full and the results compared, with changes being copied to the original dimension
  • Process Add – just add new rows to the dimension, don’t do anything with existig rows

There are lots of posts around the internet that discuss the pros and cons of each of these, so I won’t go into the details. But as the title suggests, we’ll focus on the ProcessAdd option.

How do you implement ProcessAdd?

Daniel provides a fantastic explanation and code walkthrough of how to use SSIS (Integration Services) to implement a ProcessAdd solution, please read through his post before continuing. He talks us through using an SSIS data flow to create a source query to retrieve the new rows, and then the Dimension Processing component as a destination to receive the new rows.

ProcessAdd01

And this works really well. The problem then becomes how to identify which rows need to be added. This is a problem that Daniel overcomes by having a seperate table of records to be added. The source query just selects the entire table. This table could be populated by the ETL logic, a trigger, or any number of other processes. But can we improve upon this and make the proccess more seamless and transparent?

Getting all MDX

For me, the best way of enhancing this is to build a solution which self maintains. So, we need to look inside the cube dimension, find the last record that has already been added, and then use that to find any new records from the warehouse.

Step 1 – Find the maximum key for a dimension

1
2
3
4
5
6
7
 WITH MEMBER [Measures].[MaxKey] AS
   MAX([SOURCE Currency].[SOURCE Currency Code].ALLMEMBERS
   , StrToValue([SOURCE Currency].[SOURCE Currency Code].currentmember.MEMBER_KEY))
 SELECT
   {[Measures].[MaxKey]} ON 0
 FROM
   [Adventure Works]

The MDX above creates a calculated member called MaxKey, which finds the maximum Key (the surrogate key identifier for the dimension) across all members of the dimension. Note that we have to use StrToValue() to convert the key from a string to an integer so that it sorts numerically not alphabetically. If you run this in SSMS against the AdventureWorks2012DW database the result should be 105. You can verify this by querying the DimCurrency table in the warehouse and you’ll find that 105 is the largest CurrencyKey value.

ProcessAdd02

Step 2 – How do we use MDX within SSIS?

In the Control Flow of the SSIS package, add an ‘Execute SQL’ task. Connect it to a new OLE DB connection to Analysis Services, as per the following screenshot.

ProcessAdd03

 

Set the SQL Statement of the Data Source to the MDX query, and set the ResultSet property to ‘Single Row’.

ProcessAdd04

On the Result Set tab, set the [Measures].[MaxKey] result to be saved to a new variable; ‘User::MaxKey’ (An Int32).

ProcessAdd05

The Execute SQL task should preceed the existing Data Flow Task.

ProcessAdd08

Step 3 – Dynamically create the SQL Query from the MDX results

Now we have the maximum Key value stored in a variable, we can create another variable, called ‘SQL’ (a string), with an expression which includes the Key value.

ProcessAdd06

The expression should be

1
2
3
 "SELECT  CurrencyKey, CurrencyAlternateKey, CurrencyName
  FROM    DimCurrency
  WHERE   CurrencyKey&gt;" +  (DT_WSTR,20)@[USER::MaxKey]

This builds a SQL statement dynamically, based upon the maximum key identified from the MDX statement.

We can then set the source query in the Data Flow to use the resulting SQL Query, by setting its ‘Data Access Mode’ to ‘SQL Command From Variable’, and the ‘Variable Name’ to ‘User::SQL’.

ProcessAdd07

And that’s it. This process will automatically find any new dimension members in the warehouse table, and add them into the cube dimension.

This process does rely on a number of things – so watch out…!

  • The dimension must only contain inserts, not updates or deletes (you’ll need to perform a ProcessUpdate for those)
  • The Key must always be incrementing – new records with a smaller key will not get picked up!
  • You’ll need to ignore duplicate key errors during processing. Plenty of discussion around why in other blog posts…
  • SSIS is not great at handling MDX connections – unless you want to go down the linked server route, only ever have a single MDX query per SSIS package.

And finally, another shout out to Daniel Calbimonte and MSSQLTips for providing the basis of this post.

Frog-Blog-Out

 

© Alex Whittles, Purple Frog Systems Ltd

How do you use MDX to find the number of items between two dates, when the start date and end date are different role playing dimensions?

I had an interesting question from Ricardo Santos on another frog-blog post this week (MDX Calculated Member spanning multiple date dimensions). I answered Ricardo in the post, but it was a good question which I thought deserved a full blog post on its own. So thanks to Ricardo for prompting this.

The Scenario

If you have a ‘Project’ fact table, which has two dates (start date and end date), how do you find out how many projects are active on any given date. In this scenario we would have three role playing date dimensions; Date, Start Date & End Date.

We want the user to be able to select a date (from the Date dimension) and then use that to filter the projects using the other two date dimensions.

This also applies to many other scenarios; accomodation occupancy, staffing levels, contracts, really anything with a start and end date.

The MDX answer

If there was just a single date dimension (i.e. a typical sales fact) then you could just filter it

1
  ([Measures].[Internet Sales Amount], [DATE].[Calendar].&[20080110])

But what we need to do is combine two different date dimensions. In SQL, to find the number of active projects on 1st April 2013 we’d just use

1
2
3
4
   SELECT COUNT(*)
   FROM Projects
   WHERE StartDate >= 20130401
     AND EndDate   <= 20130401

The equivalent in MDX is more complex as we have to filter a fact by two different dimensions. Starting with the [Start Date] dimension, we need to sum the project count for the projects with a start date on or before the required date:

1
    AGGREGATE({NULL:[START DATE].[Calendar].[20130401]}, [Measures].[Project COUNT])

We also need to then find the project count for all projects with an end date on or after the required date

1
    AGGREGATE({[END DATE].[Calendar].[20130401]:NULL}, [Measures].[Project COUNT])

We can then use crossjoin (*) to combine them together.

1
2
3
    AGGREGATE( {NULL:[START DATE].[Calendar].[20130301]}
             * {[END DATE].[Calendar].[20130301]:NULL}
             , [Measures].[Project COUNT])

Note that you could also use FILTER to achieve the same goal.

This will find the project count for a specific date, but we now need to make the date dynamic. We want the calculation to use a date selected by the user from a 3rd [Date] dimension, applying this date to the other two dimensions (start date & end date).

Enter the LINKMEMBER function. This is a great tool in your MDX arsenal, allowing you to identify a member of one dimension from a selection in a different role playing dimension. e.g.

1
     LINKMEMBER([DATE].[Calendar].[20130414], [START DATE].[Calendar])

The example above will return the following member of the Start Date dimension.

1
     [START DATE].[Calendar].[20130414]

When we integrate this into the previous MDX, we get:

1
2
3
4
5
6
    AGGREGATE( 
          {NULL:LINKMEMBER([DATE].[Calendar].CURRENTMEMBER
                     ,[START DATE].[Calendar])}
        * {LINKMEMBER([DATE].[Calendar].CURRENTMEMBER
                    , [END DATE].[Calendar]):NULL}
        , [Measures].[Project COUNT])

This will take the CURRENTMEMBER of the [Date].[Calendar] hierarchy, and map it to the equivalent members in the [Start Date].[Calendar] and [End Date].[Calendar] hierarchies, before then using the resulting members to filter the Project Count measure.

And there you have it.

But, that’s not strictly the right answer…

Data modelling

Whenever we have to write anything complicated in MDX, we need to ask ourselves the question of why?… Complicated MDX is all too often a symptom of poor data modelling. Usually caused when a data warehouse is designed with SQL in mind, not cubes/MDX. If you’re planning to use cubes then you can reap huge benefits from remodelling your data accordingly. This could be in your ETL and warehouse, or you can just remodel your data in the cube data source view. Lets have a look at an example.

The DSV for the project fact we were looking at earlier would be something like this (simplified)

1
2
3
4
5
6
7
    SELECT 
       , ProjectManagerKey
       , CustomerKey
       , StartDateKey
       , EndDateKey
       , 1 AS ProjectCount
    FROM FactProject

So the project fact represents a project. A cube fact table should however represent an event, not an item (items should be dimensions, not facts). So we need to think of a project as two seperate events; a start and an end. Take a look at the query below

1
2
3
4
5
6
7
8
9
10
11
12
13
    SELECT 
       , ProjectManagerKey
       , CustomerKey
       , StartDateKey AS DateKey
       , 1 AS ProjectCount
    FROM FactProject
	UNION ALL
    SELECT 
       , ProjectManagerKey
       , CustomerKey
       , EndDateKey AS DateKey
       , -1 AS ProjectCount
    FROM FactProject

Each project will therefore exist twice, once with a +1 on the project start date, and once with a -1 on the project end date. And we only have a single date dimension to worry about.

So how do we use this in MDX?

You can find out the number of active projects on any given date using

1
2
    AGGREGATE({NULL:[DATE].[Calendar].CURRENTMEMBER}
            , [Measures].[Project COUNT])

i.e. the sum of all projects started/finished prior to the selected date.

To find out the change in the number of projects in any given time period

1
2
    ([DATE].[Calendar].CURRENTMEMBER
    ,[Measures].[Project COUNT])

In the SQL world this is an inefficient approach, but it’s perfectly suited to the way cubes work.

So by spending a little time thinking about the cube’s data model, we can dramatically simplify our MDX, and improve our performance.

Frog-blog out

© Alex Whittles, Purple Frog Systems Ltd

Today we’re back looking at MDX with a quick how-to. How do you extract a list of dimension members using MDX?

This could be used for example to populate a user interface/parameter of a report/dashboard.

Let’s look at two scenarios; normal dimensions, and then hierarchies. All the queries here use the “Adventure Works DW 2008 R2″ “Adventure Works” cube.

Standard dimension attributes

So how do we get a list of dimension members? Well lets divide the query up into rows and columns. We want the members listed on the rows.

1
2
3
4
SELECT
  xxx ON COLUMNS,
  [Geography].[Country].ALLMEMBERS ON ROWS
FROM [Adventure Works]

Why use .ALLMEMBERS instead of .MEMBERS? .ALLMEMBERS will include calculated members, whereas .MEMBERS won’t.

So what do we put on the rows? We want the display name for showing to the user, and the unique name so we can refer back to each exact member. We can get these by using properties of the current member, namely MEMBER_CAPTION and UNIQUENAME. Get a full list of dimension member properties here. To make this work, we just define a number of query-scoped calculated members using WITH, and then include them in the columns set…

1
2
3
4
5
6
7
8
WITH 
   MEMBER [Measures].[Label] AS [Geography].[Country].CURRENTMEMBER.MEMBER_CAPTION 
   MEMBER [Measures].[UniqueName] AS [Geography].[Country].CURRENTMEMBER.UNIQUENAME 
SELECT {[Measures].[Label], 
        [Measures].[UniqueName]
        } ON COLUMNS , 
      [Geography].[Country].ALLMEMBERS ON ROWS 
FROM [Adventure Works]

If you run this, you get the following output:

This may be what you want, but note that the first row of the result set contains the ‘ALL’ member, which you probably don’t want. You can remove this by altering the ROWS clause to exclude them. This is simply a case of repeating the hierarchy name, e.g. [Geography].[Country].[Country].ALLMEMBERS

1
2
3
4
5
6
7
8
WITH 
   MEMBER [Measures].[Label] AS [Geography].[Country].CURRENTMEMBER.MEMBER_CAPTION 
   MEMBER [Measures].[UniqueName] AS [Geography].[Country].CURRENTMEMBER.UNIQUENAME 
SELECT {[Measures].[Label], 
        [Measures].[UniqueName]
        } ON COLUMNS , 
      [Geography].[Country].[Country].ALLMEMBERS ON ROWS 
FROM [Adventure Works]

Hierarchies

With simple dimension attributes, you get a flat list of values. With hierarchies, whether standard user hierarchies, ragged or parent-child, you need to return a multi-level list. To do this we need to know what level each member is at. We can query this using the LEVEL.ORDINAL property. Adding this into the query, and replacing Geography for Employee, we get the following query:

1
2
3
4
5
6
7
8
9
10
WITH 
   MEMBER [Measures].[Label] AS [Employee].[Employees].CURRENTMEMBER.MEMBER_CAPTION 
   MEMBER [Measures].[UniqueName] AS [Employee].[Employees].CURRENTMEMBER.UNIQUENAME 
   MEMBER [Measures].[Ordinal] AS [Employee].[Employees].CURRENTMEMBER.LEVEL.ORDINAL 
SELECT {[Measures].[Label], 
        [Measures].[UniqueName], 
        [Measures].[Ordinal]
        } ON COLUMNS , 
      [Employee].[Employees].ALLMEMBERS ON ROWS 
FROM [Adventure Works]

With the following results:

You can then do all sorts of funky things using basic MDX navigation functions. e.g. returning the parent of each member, by simply adding …PARENT.UNIQUENAME as another column

1
2
3
4
5
6
7
8
9
10
11
12
WITH 
   MEMBER [Measures].[Label] AS [Employee].[Employees].CURRENTMEMBER.MEMBER_CAPTION 
   MEMBER [Measures].[UniqueName] AS [Employee].[Employees].CURRENTMEMBER.UNIQUENAME 
   MEMBER [Measures].[Ordinal] AS [Employee].[Employees].CURRENTMEMBER.LEVEL.ORDINAL 
   MEMBER [Measures].[Parent] AS [Employee].[Employees].PARENT.UNIQUENAME
SELECT {[Measures].[Label], 
        [Measures].[UniqueName], 
        [Measures].[Ordinal],
        [Measures].[Parent]
        } ON COLUMNS , 
      [Employee].[Employees].ALLMEMBERS ON ROWS 
FROM [Adventure Works]

Now go forth and query those dimension members…
FrogBlog-Out

© Alex Whittles, Purple Frog Systems Ltd

I had an interesting question the other day; “how do I find the date of the first order/activity for a given customer/employee/product etc in MDX”?

Well you’ll be pleased to know that it’s pretty simple. The sample code bellow will work with the AdventureWorks DW 2008R2 cube.

First of all we need to filter the date dimension to contain only those with orders. Then take the first item of the resulting set, then find its name. Simples.

Filter the date hierarchy to only those dates with a Reseller Order Count:

   FILTER([Date].[Date].[Date].MEMBERS, [Measures].[Reseller Order count])

Note that you can specify a more advanced filter, such as [Measures].[Reseller Order Count]>10, which would find the first date which had more than 10 orders.

Then find the first item:

   .ITEM(0)

Then find the its name:

   .NAME

Put this together in a query and you get:

   WITH MEMBER [Measures].[First Activity] AS
       FILTER([Date].[Date].[Date].MEMBERS
            , [Measures].[Reseller Order count]).ITEM(0).NAME
   SELECT {[Measures].[Reseller Order count]
         , [Measures].[First Activity]
       } ON 0,
      [Employee].[Employees].MEMBERS ON 1
   FROM [Adventure Works]

This returns the total reseller order count for each employee, along with the date of their first reseller order. Note that the Employee hierarchy here is a parent child hierarchy. The calculated member [First Activity] will aggregate and calculate correctly with any dimension or member specified on the 1 (ROWS) axis, be it a parent-child, single attribute, normal hierarchy, etc. and will always find the first order date for the current member.
You should get the following results:

Frog-Blog Out.

© Alex Whittles, Purple Frog Systems Ltd

PowerPivot Vs QlikView 101 – SQLBits Video

The video is now available from my PowerPivot and QlikView talk at SQLBits 9 in Liverpool in September 2011. You can download or watch the video here.


In this 1 hour session I create interactive dashboards from scratch in both PowerPivot and QlikView, showing how to set up the data model, overcome common pitfalls and build the dashboards. I create almost identical dashboards in both systems and highllight the pros and cons of each system.


PowerPivot QlikView


You can find the scripts and code to go with this video in this blog post


SQLBits 10 is coming!!

Today the SQLBits organisers have announced that SQLBits 10 will be held in London between 29th – 31st March 2012 in the Novotel London West. It’s going to be even bigger and better, so keep an eye on the website and get your tickets early.

© Alex Whittles, Purple Frog Systems Ltd
The Frog Blog

I'm Alex Whittles.

I specialise in designing and implementing SQL Server business intelligence solutions, and this is my blog! Just a collection of thoughts, techniques and ramblings on SQL Server, Cubes, Data Warehouses, MDX, DAX and whatever else comes to mind.

Organising SQLRelay

Submit a session for SQLBits

Frog Blog Out
twitter
rssicon