Analysis Services

This post explains how to create an MDX calculated member that will take a value from the cube and project it forward to the end of the year. This provides a simple mechanism for calculating what your expected total will be at year end, based upon current performance.

To do this more accurately you should use time series data mining models in SSAS and use DMX expressions to query the results, but this method is very simple and requires little effort, and will be pretty accurate so long as the data you’re modelling is fairly linear. Please note though that the more cyclical and seasonal your data is the less effective this will be.

The basic idea is that we take what we have done so far (i.e. year to date sales), look at how far through the year we are, and extrapolate the value of future months (or days/weeks/etc.) based upon values so far.

i.e. If we’re at March month end and we’ve sold 100 widgets so far this year, we’re 1/4 of the way through the year so we multiply 100 by 4 and get a prejected yearly total of 400.


This chart shows the concept of what we’re doing, and shows the full year prejections calculated in March (with 3 months of available data) and June (6 months of data). The projections obviously get more accurate the further you are through the year.

One of the points to note is that when creating a calculation like this, based upon a time dimension, the calculation should always work with any level of the dimension hierarchy selected. i.e. The user shouldn’t care whether they’re looking at a month, week, quarter or a day, the calculation should always work the same. To achieve this we simply use the .currentmember of the time hierarchy.

The following examples are based upon projecting the Internet Sales Amount measure found within the SQL Server 2008 Adventure Works DW sample cube.

Step 1 – What are our total sales so far this year?

MDX helpfully provides us with the YTD function which takes care of this for us.


  MEMBER [Measures].[YTD Sales] AS
    AGGREGATE(
      YTD([Date].[Calendar].CurrentMember)
      ,[Measures].[Internet Sales Amount])

This takes the current member of the Calendar hierarchy, and creates a set of all dates before it (this year) using YTD. It then aggregates (in this case sums) the Internet Sales Amount for all of these dates to calculate YTD Sales.

Step 2 – Which period are we in?

Here we’ll use the same YTD function to create a set of all dates so far this year, but in this case we’ll count the number of resulting members. Note that because we’re using the .CurrentMember of the hierarchy, it doesn’t matter if we’re looking at a date, week or month, the MDX will work. i.e. If we’re looking at 21 Jan it will return 21. If we’re looking at Q3 it will return 3, August will return 8 etc.


  MEMBER [Measures].[CurPeriod] AS
    COUNT(
      YTD([Date].[Calendar].CurrentMember)
      ,INCLUDEEMPTY)

Step 3 – How many periods are in the year?

If we coded this to only work with months then we could hard code this to 12 however we need to keep it generic to all levels of the hierarchy. So, we have to count all the cousins of the current time member [within this year]. Unfortunately there isn’t a Cousins function in MDX, and Siblings will only return other members within the same parent. i.e. siblings of May 4th would include May 1 through to May 31. To get around this we find the year of the current member by using the Ancestor function.


  ANCESTOR([Date].[Calendar].CurrentMember
  , [Date].[Calendar].[Calendar Year])

Then we find all of the descendants of the year, at the same level of the hierarchy (week/day/etc.) as the current member. We can then take a count as before.


  MEMBER [Measures].[TotalPeriods] AS
    COUNT(
      DESCENDANTS(
        ANCESTOR([Date].[Calendar].CurrentMember
          ,[Date].[Calendar].[Calendar Year])
        ,[Date].[Calendar].CurrentMember.level)
      ,INCLUDEEMPTY)

Step 4 – Calculate the Run Rate

Calculating the prejected yearly total (run rate) is then a simple calculation


  MEMBER [Measures].[Full Year Run Rate] AS
    [Measures].[YTD Sales]
    * ([Measures].[TotalPeriods]
       /[Measures].[CurPeriod])

You can then put the whole lot together and see the results…


WITH

  MEMBER [Measures].[YTD Sales] AS
    AGGREGATE(
      YTD([Date].[Calendar].CurrentMember)
      ,[Measures].[Internet Sales Amount])

  MEMBER [Measures].[CurPeriod] AS
    COUNT(
      YTD([Date].[Calendar].CurrentMember)
      ,INCLUDEEMPTY)

  MEMBER [Measures].[TotalPeriods] AS
    COUNT(
      DESCENDANTS(
        ANCESTOR([Date].[Calendar].CurrentMember
          ,[Date].[Calendar].[Calendar Year])
        ,[Date].[Calendar].CurrentMember.level)
      ,INCLUDEEMPTY)

  MEMBER [Measures].[Full Year Run Rate] AS
    [Measures].[YTD Sales]
    * ([Measures].[TotalPeriods]
       /[Measures].[CurPeriod])

SELECT
{
     [Measures].[Internet Sales Amount]
    ,[Measures].[YTD Sales]
    ,[Measures].[Full Year Run Rate]
    ,[Measures].[CurPeriod]
    ,[Measures].[TotalPeriods]
} ON 0,
{
    DESCENDANTS([Date].[Calendar].[CY 2003])
} ON 1
FROM [Direct Sales]

In my next blog I’ll be diong the same calculation in DAX for use with PowerPivot, stay tuned…

Frog-Blog Out

It’s common in most cubes to have a number of different date dimensions, whether role playing, distinct, or a combination of both. Say for example, Entry Date, Posting Date and Accounting Period. There may also be numerous hierarchies in each date dimension, such as calendar and fiscal calendar, leading to a relatively complicated array of dates to worry about when calculating semi-additive measures.

If we create a date related calculation (i.e. total to date) how do we ensure that this calculation works across all date dimensions?

Lets assume we have a stock movement measure, where each record in the fact table is the change in stock (plus or minus). The current stock level is found by using a calculation totaling every record to date.

CREATE MEMBER CURRENTCUBE.[Measures].[Stock Level]
AS
  SUM({NULL:[Date].[Calendar].CurrentMember}
     , [Measures].[Stock Movement]
  );

[Note that {NULL:xxx} just creates a set of everything before the xxx member, i.e. everything to date]

This works just fine, if the user selects the [Date].[Calendar] hierarchy. What if the user selects the [Date].[Fiscal] hierarchy, or the [Period] dimension? Basically the calculation wont work, as the MDX expression is only aware of the [Date].[Calendar] hierarchy.

The simple solution is to use the Aggregate function over all of the dimensions that the calculation needs to be aware of:

CREATE MEMBER CURRENTCUBE.[Measures].[Stock Level]
AS
  AGGREGATE(
      {NULL:[Date].[Fiscal].CurrentMember}
       * {NULL:[Date].[Calendar].CurrentMember}
       * {NULL:[Period].[Period].CurrentMember}
    , [Measures].[Stock Movement]
  );

The calculation will then use whichever date or time hierarchy is selected. It will even cope if multiple dimensions are selected, say the calendar on 0 and the periods on 1, both axis will honor the aggregation as expected.

Frog-Blog out.

I’m excited to be presenting another session to the South Wales SQL Server User Group.

On Thursday 24th June 2010, Eversheds in Cardiff are kindly hosting the event, to run from 18:45 to 21:00.
The event is free, and you’ll even get pizza thrown in – what more can you ask for? Oh yes, some BI content…

My session will cover data warehouse modelling, including a number of hands on business case studies including transactional data, account balances and duration based data.

Please feel free to bring your own data modelling problems along and I’ll try and cover as many as I can.

Register for free here: http://www.sqlserverfaq.com/events/235/Data-warehouse-design-case-studies-Other-BI-related-session-TBC.aspx

Hope to see you there!

Purple Frog spent a very interesting day at Microsoft last week, at one of their many events promoting the launch of SQL Server 2008 R2. Rafal Lukewiecki presented an entertaining (as always!) and informative series of talks covering the release, focusing on the enhanced Business Intelligence tools available.

The primary changes to note are

  • Power Pivot – An in memory, client side add-in to Excel, that allows users to create virtual cubes on their desktop and analyse over 100m records of data virtually instantly
  • DAX – A new expression language, designed for non-technical (probably more realistically, semi-technical) users to extend pivot tables and power pivot tables without having to learn MDX
  • Report Components – In a report consisting of a couple of tables, a chart and a few gauges (gauges, sparklines & maps are all new features of SSRS), you can save each element as a component and re-use it in different reports. This should result in much less duplication of work.
  • Report Builder 3 – A thin-client tool allowing end users to create Reporting Services reports. This is a big enhancement over its predecessor s it is finally fully compatible with reports created in the Business Intelligence Development Studio (BIDS), including report components.
  • Master Data Services – A centralised tool and database intended to provide governance of your organisation’s master data (centralised list of products, fiscal calendar, regions etc.).

The enhancements to Reporting Services (SSRS) are very welcome, and should be of huge benefit to anyone either currently using SSRS or considering using it. I firmly believe that there are no comparable web based reporting engines that even come close for SME organisations when looking at the whole picture including cost of implementation, ease of use, flexibility and capability.

Master Data Services as a concept has been around for a long time, but there has never been a tool available to organisations to effectively implement it. This is Microsoft’s first proper stab at delivering a workable solution, and although I’m a big fan of the concept, and have no doubt of its benefit to a SME, I’m yet to be convinced that the tool is ready for a large scale corporate environment. Time will tell how scalable and manageable the system is, and credit has to go to Microsoft for starting the ball rolling.

The most impressive addition is without a doubt PowerPivot. In a nutshell, it’s a user defined OLAP cube wrapped up within Excel 2010, running entirely in memory on a user’s workstation. If you’ve not yet played with it or seen a demo, I’ll try and elaborate for you… Think about loading Excel with 1 million rows, and then imagine sorting and filtering a number of those columns [cue going out to lunch whilst waiting for Excel to catch up]. With PowerPivot, you can sort and filter over 100 million rows of data almost in an instant – it’s very impressive indeed!

That’s the snazzy demo bit, but to reduce it to a glorified spreadsheet is very harsh indeed. It allows a user to import multiple data sources and combine them together into a single dimensional data model, PowerPivot will create your own personal cube, without you having to build a warehouse, without knowing anything about MDX, dimension hierarchies, attribute relationships, granularity etc. etc.

Microsoft’s vision and reason for creating this tool is self-service BI, allowing users to create their own cubes, data analysis environments and reporting systems. And this is where I start to have a problem…

I can’t remember the last time I designed a data warehouse, where I did not find significant data quality problems, conflicting data, missing data, duplicated data etc.. I also find it hard to think of a situation where an end user (even a power user) is sufficiently clued up about the intricacies of a source OLTP database to be able to extract the right data and know what to do with it. Or if they are, a dozen other people in different departments have a different idea about how things work, resulting in many different versions of the truth.

I’m therefore (for now!) sticking with the opinion that it is still absolutely vital for an organisation to provide a clean, consistent, dimensionally modelled data warehouse as the basis for their BI/MI infrastructure. Tools like PowerPivot then sit very nicely on top to provide an incredibly powerful and beneficial user experience, but to try and use the emergence of self-service BI tools to usher in a new ‘non-data warehouse’ era is a very dangerous route which I hope people will avoid.

In summary – this release brings with it a fantastic host of new tools, but with great power comes great responsibility…

When browsing a cube using Excel 2007, you can drillthrough the measures to display up to 1000 rows of the transaction level source data.

I often get asked whether this limit of 1000 rows is configurable – well the good news is yes it is.

There is an option in the actions tab of the BIDS cube designer which allows you to specify the maximum rows, but helpfully this is ignored by Excel. Instead, you have to set it in Excel when you create a pivot.

Just click “Options” on the “PivotTable Tools” ribon, then in the “Change Data Source” dropdown click on “Connection Properties“. In this screen, just change the “Maximum number of records to retrieve” property.

Excel 2007 Pivot Options

I’ve just read an interesting thread on the SQL Server Developer Center forum, regarding how to filter results. Specifically the difference in MDX between using a subselect

SELECT x on COLUMNS, y on ROWS FROM ( SELECT z on COLUMNS FROM cube))

or using a where clause

SELECT x on COLUMNS, y on ROWS FROM cube WHERE z

In a simple query they produce the same results, but what is the actual difference? You can read the full thread here, but to summarise Darren Gosbell’s response…

Using the WHERE clause sets the query context and consequently the CurrentMember. This then enables functions such as YTD and PerdiodsToDate to work.

Using a subselect can provide improved performance, but does not set the context.

Simples..!

We were recently investigating a problem for a client regarding the use of Scope within MDX calculated members. The code in question was similar to this:

CREATE MEMBER
   CURRENTCUBE.[Measures].[Test Measure To Date]
   AS "NA", VISIBLE = 1;
Scope([Date].[Calendar].MEMBERS);
    [Measures].[Test Measure To Date] =
      SUM(NULL:[Date].[Calendar].CurrentMember,
        [Measures].[Test Measure]);
End Scope;
Scope([Date].[Fiscal].MEMBERS);
    [Measures].[Test Measure To Date] =
      SUM(NULL:[Date].[Fiscal].CurrentMember,
        [Measures].[Test Measure]);
End Scope;

Essentially the warehouse was providing a transaction table with credits and debits, this calculated measure was supposed to provide the current balance, summing all transactions to date (not just the current year/period etc, but the entire history). Scope is used to enable the calculation to work across two different date hierarchies, calendar and fiscal.

The problem was that even when the [Date].[Calendar] hierarchy was selected, the code still used the fiscal hierarchy to calculate the value.

This is caused by the fact that [Date].[Fiscal].MEMBERS includes the member [Date].[Fiscal].[All]. Consequently, even when the Fiscal hierarchy was not included in the query, its [All] member was effectively still within the scope. Thus the fiscal calculation was overriding the calendar calculation no matter what was selected.

The solution to this is to exclude [All] from the scope, which can be done by changing the code to the following:

CREATE MEMBER
   CURRENTCUBE.[Measures].[Test Measure To Date]
   AS "NA", VISIBLE = 1;
Scope(DESCENDANTS([Date].[Calendar],,AFTER));
    [Measures].[Test Measure To Date] =
      SUM(NULL:[Date].[Calendar].CurrentMember,
        [Measures].[Test Measure]);
End Scope;
Scope(DESCENDANTS([Date].[Fiscal],,AFTER));
    [Measures].[Test Measure To Date] =
      SUM(NULL:[Date].[Fiscal].CurrentMember,
        [Measures].[Test Measure]);
End Scope;

DESCENDANTS(xxx,,AFTER) is a simple way of identifying every descendent of the hierarchy AFTER the current member, which is [All] when not specified.

Problem solved, Frog-blog out.

I was working on a new SSAS 2008 cube today, and came across an error which Google was unable to help with. I thought I’d post the solution here to help anyone else who may encounter it.

The cube in question will be primarily be accessed using Excel 2007, so I’d been dutifully testing it along the way to ensure all was well. And then, after a number of changes the following error appeared when connecting to the cube from Excel to create a pivot table.

Excel was unable to get necessary information about this cube. The cube might have been reorganized or changed on the server.

Contact the OLAP cube administrator and, if necessary, set up a new data source to connect to the cube

Connecting and querying the cube via SSMS or BIDS worked without error (hense I didn’t spot the error sooner!).

A quick Google revealed a number of posts regarding this error, but they all related to attributes containing invalid characters when accessed from Excel 2000 Or problems with translations and locale settings in the .oqy file. Neither of these was the cause here, so I had to go back and recreate every change I had made step by step to track the problem.

Well, I’m please to report that in the end it was nothing more that a simple spelling mistake in a named set. One of the dynamic named sets in the cube calculations referred to a specific member of a dimension, which was spelled slightly incorrectly. (Simplified example..)

CREATE DYNAMIC SET CURRENTCUBE.[Set1]
 AS {[Dimension].[Attribute].[Value1],
     [Dimension].[Attribute].[Value2WithTypo]
    };

When querying calculated measures through MDX in SSMS, the MDX parser just ignored the problem and only uses the valid members, however it appears as though Excel 2007 is slightly more picky with its cubes.

Useful to know, and even more useful when used as a tool to double check for any errors in the MDX calculations.

After a couple of days off work last week with SQL Bits III and SQL 2008 Unleashed, it’s back to the grindstone this week; however before I get onto the serious stuff I’d like to say thank you to the organisers of both events. Watching you on stage headbanging to Rockstar – legendary! (You can see for yourself here and here…).

Anyway, back to MDX…

This post explains how you can build a dynamic MDX query in Reportins Services, customised to the users requirements. This can often bring some quite major performance benefits.

Lets say for example that you want to have a sales report grouped dynamically by either product, sales person, department or customer. Normally you would use a single static MDX query, and then add a dynamic grouping to the table in the report. This is fine, until you try it on a large dataset. If you only have 50 products, 2 salesmen, 5 departments and 100 customers, your MDX needs to return 50,000 records, the report then has to summarise all of this into the level of grouping you want. This renders the pre-calculated aggregations in OLAP pretty much worthless.

To get around this, you can generate your MDX dynamically, so that the query returns the data already grouped into the correct level. You can also use this to add extra filters to the query, but only when they are required.

To start with, lets see how we would do this normally with SQL. Assuming we’re working from a denormalised table such as this

Dynamic MDX Table

Dynamic SQL is pretty simple, instead of having your dataset query as

  SELECT SalesPerson,
      Sum(Sales) AS Sales
   FROM tblData
   GROUP BY SalesPerson

you can add a report parameter called GroupBy,

Dynamic MDX SQL

and then use an expression as your dataset

  ="SELECT "
      + Parameters!GroupBy.Value + " AS GroupName,
      Sum(Sales) AS Sales
   FROM tblData
   GROUP BY " + Parameters!GroupBy.Value

However MDX queries don’t let you use an expression in the dataset, so we have to work around that quite major limitation. To do this we make use of the OpenRowset command. You need to enable it in the surface area config tool, but once it’s enabled you can fire off an OpenRowset command to SQL Server, which will then pass it on to the cube. As the datasource connnection is to SQL Server not Analysis Services, it allows you to use an expression in the dataset.

  ="SELECT * FROM OpenRowset('MSOLAP',
    'DATA SOURCE=localhost; Initial Catalog=SalesTest;',
    'SELECT
      {[Measures].[Sales]} ON 0,
      NON EMPTY {[Product].[Product].[Product].Members} ON 1
      FROM Sales') "

You can then expand this to make it dynamic depending on the value of a parameter. Before we do this though, there are a couple of items I should point out.
1) As the expression can get quite large, I find it much easier to create the query from a custom code function
2) As SSRS can’t interpret the expression at runtime, you need to define the fields in your dataset manually (more on this later)

To use a custom code function, just change the dataset expression to

  =Code.CreateMDX(Parameters)

We pass in the parameters collection so that we can use the parameters to determine the query. Create a function called CreateMDX() in the code block

Dynamic MDX Code

You can then construct your MDX query within the code block.

  Public Function CreateMDX(ByVal params as Parameters) as string

   Dim mdx as string

   mdx = "SELECT * FROM OpenRowset("
   mdx += " 'MSOLAP', "
   mdx += " 'DATA SOURCE=localhost; Initial Catalog=SalesTest;', "
   mdx += " ' SELECT {[Measures].[Sales]} ON 0, "
   mdx += "    NON EMPTY {[Product].[Product].[Product].Members} ON 1 "
   mdx += "   FROM Sales ' "
   mdx += ")"

   return mdx

End Function

We’re almost there…
The next problem is that the field names returned by the query are less than helpful. To fix this we just need to alias the fields in the query. I usually take the opportunity of casting the numerical fields so that the report treats them as such, rather than as a string.

  Public Function CreateMDX(ByVal params as Parameters)
                  as string

   Dim mdx as string

   mdx = "SELECT "
   mdx += "  ""[Product].[Product].[Product].[MEMBER_CAPTION]"" AS GroupName, "
   mdx += "   Cast(""[Measures].[Sales]"" AS int) AS Sales "
   mdx += " FROM OpenRowset("
   mdx += " 'MSOLAP', "
   mdx += " 'DATA SOURCE=localhost; Initial Catalog=SalesTest;', "
   mdx += " ' SELECT {[Measures].[Sales]} ON 0, "
   mdx += "    NON EMPTY {[Product].[Product].[Product].Members} ON 1 "
   mdx += "   FROM Sales ' "
   mdx += ")"

   return mdx

End Function

(please do watch out for the quotes, double quotes and double double quotes, it can get a little confusing!)
We then need to tell the dataset which fields to expect from the query.

Dynamic MDX Fields

You can now use the dataset in your report.
However, the original point of this was to make the query dynamic… All we need to do to achieve this is expand the VB.Net code accordingly.

  Public Function CreateMDX(ByVal params as Parameters) as string

   Dim mdx as string

   mdx = "SELECT "

IF params("GroupBy").Value.ToString()="Product" THEN
   mdx += "  ""[Product].[Product].[Product]"
ELSE IF params("GroupBy").Value.ToString()="SalesPerson" THEN
   mdx += "  ""[Sales Person].[Sales Person].[Sales Person]"
ELSE IF params("GroupBy").Value.ToString()="Customer" THEN
   mdx += "  ""[Customer].[Customer].[Customer]"
END IF

   mdx += ".[MEMBER_CAPTION]"" AS GroupName, "

   mdx += "   Cast(""[Measures].[Sales]"" AS int) AS Sales "
   mdx += " FROM OpenRowset("
   mdx += " 'MSOLAP', "
   mdx += " 'DATA SOURCE=localhost; Initial Catalog=SalesTest;', "
   mdx += " ' SELECT {[Measures].[Sales]} ON 0, "

IF params("GroupBy").Value.ToString()="Product" THEN
   mdx += "  NON EMPTY {[Product].[Product].[Product]"
ELSE IF params("GroupBy").Value.ToString()="SalesPerson" THEN
   mdx += "  NON EMPTY {[Sales Person].[Sales Person].[Sales Person]"
ELSE IF params("GroupBy").Value.ToString()="Customer" THEN
   mdx += "  NON EMPTY {[Customer].[Customer].[Customer]"
END IF

   mdx += ".Members} ON 1 "

   mdx += "   FROM Sales ' "
   mdx += ")"

   return mdx

End Function

It’s certainly not that simple, and debugging can cause a few headaches, but you can benefit from a massive performance in complex reports if you’re prepared to put the work in.

Dynamic MDX Results

You can download the project files here

As always, please let me know how you get on with it, and shout if you have any queries…

Alex

I almost feel embarrassed…, I’ve been writing this blog for over 9 months now, and I have yet to mention Mosha, although in my defence, there is a link to his blog in the links section to the right.

As many/most of you may know, Mosha Pasumansky is one the key brains behind designing the MDX language and Analysis Services – nuff said?

Over the last year he has been working on a pet project, MDX Studio. It’s an MDX query tool which any self respecting OLAP developer should now be using on a regular basis. He has just released v0.4.6, which adds some really nifty features such as the dependency view.

If you’re just starting out with MDX, then the intellisense will be of massive benefit to you; even if you’re a seasoned pro, the performance monitoring is an essential tool on its own.

If you haven’t already tried it, have a look at Mosha’s blog, and get a copy – you won’t regret it.

And thanks for all your hard work Mosha – It’s much appreciated.

Alex

The Frog Blog

I'm Alex Whittles.

I specialise in designing and implementing SQL Server business intelligence solutions, and this is my blog! Just a collection of thoughts, techniques and ramblings on SQL Server, Cubes, Data Warehouses, MDX, DAX and whatever else comes to mind.

Submit a session for SQLBits

Frog Blog Out
twitter
rssicon