How do you use MDX to find the number of items between two dates, when the start date and end date are different role playing dimensions?

I had an interesting question from Ricardo Santos on another frog-blog post this week (MDX Calculated Member spanning multiple date dimensions). I answered Ricardo in the post, but it was a good question which I thought deserved a full blog post on its own. So thanks to Ricardo for prompting this.

The Scenario

If you have a ‘Project’ fact table, which has two dates (start date and end date), how do you find out how many projects are active on any given date. In this scenario we would have three role playing date dimensions; Date, Start Date & End Date.

We want the user to be able to select a date (from the Date dimension) and then use that to filter the projects using the other two date dimensions.

This also applies to many other scenarios; accomodation occupancy, staffing levels, contracts, really anything with a start and end date.

The MDX answer

If there was just a single date dimension (i.e. a typical sales fact) then you could just filter it

  ([Measures].[Internet Sales Amount], [Date].[Calendar].&[20080110])

But what we need to do is combine two different date dimensions. In SQL, to find the number of active projects on 1st April 2013 we’d just use

   SELECT Count(*)
   FROM Projects
   WHERE StartDate >= 20130401
     AND EndDate   <= 20130401

The equivalent in MDX is more complex as we have to filter a fact by two different dimensions. Starting with the [Start Date] dimension, we need to sum the project count for the projects with a start date on or before the required date:

    AGGREGATE({NULL:[Start Date].[Calendar].[20130401]}, [Measures].[Project Count])

We also need to then find the project count for all projects with an end date on or after the required date

    AGGREGATE({[End Date].[Calendar].[20130401]:NULL}, [Measures].[Project Count])

We can then use crossjoin (*) to combine them together.

    AGGREGATE( {NULL:[Start Date].[Calendar].[20130301]}
             * {[End Date].[Calendar].[20130301]:NULL}
             , [Measures].[Project Count])

Note that you could also use FILTER to achieve the same goal.

This will find the project count for a specific date, but we now need to make the date dynamic. We want the calculation to use a date selected by the user from a 3rd [Date] dimension, applying this date to the other two dimensions (start date & end date).

Enter the LINKMEMBER function. This is a great tool in your MDX arsenal, allowing you to identify a member of one dimension from a selection in a different role playing dimension. e.g.

     LINKMEMBER([Date].[Calendar].[20130414], [Start Date].[Calendar])

The example above will return the following member of the Start Date dimension.

     [Start Date].[Calendar].[20130414]

When we integrate this into the previous MDX, we get:

    AGGREGATE( 
          {NULL:LINKMEMBER([Date].[Calendar].CURRENTMEMBER
                     ,[Start Date].[Calendar])}
        * {LINKMEMBER([Date].[Calendar].CURRENTMEMBER
                    , [End Date].[Calendar]):NULL}
        , [Measures].[Project Count])

This will take the CURRENTMEMBER of the [Date].[Calendar] hierarchy, and map it to the equivalent members in the [Start Date].[Calendar] and [End Date].[Calendar] hierarchies, before then using the resulting members to filter the Project Count measure.

And there you have it.

But, that's not strictly the right answer...

Data modelling

Whenever we have to write anything complicated in MDX, we need to ask ourselves the question of why?... Complicated MDX is all too often a symptom of poor data modelling. Usually caused when a data warehouse is designed with SQL in mind, not cubes/MDX. If you're planning to use cubes then you can reap huge benefits from remodelling your data accordingly. This could be in your ETL and warehouse, or you can just remodel your data in the cube data source view. Lets have a look at an example.

The DSV for the project fact we were looking at earlier would be something like this (simplified)

    SELECT 
       , ProjectManagerKey
       , CustomerKey
       , StartDateKey
       , EndDateKey
       , 1 AS ProjectCount
    FROM FactProject

So the project fact represents a project. A cube fact table should however represent an event, not an item (items should be dimensions, not facts). So we need to think of a project as two seperate events; a start and an end. Take a look at the query below

    SELECT 
       , ProjectManagerKey
       , CustomerKey
       , StartDateKey AS DateKey
       , 1 AS ProjectCount
    FROM FactProject
	UNION ALL
    SELECT 
       , ProjectManagerKey
       , CustomerKey
       , EndDateKey AS DateKey
       , -1 AS ProjectCount
    FROM FactProject

Each project will therefore exist twice, once with a +1 on the project start date, and once with a -1 on the project end date. And we only have a single date dimension to worry about.

So how do we use this in MDX?

You can find out the number of active projects on any given date using

    AGGREGATE({NULL:[Date].[Calendar].CURRENTMEMBER}
            , [Measures].[Project Count])

i.e. the sum of all projects started/finished prior to the selected date.

To find out the change in the number of projects in any given time period

    ([Date].[Calendar].CURRENTMEMBER
    ,[Measures].[Project Count])

In the SQL world this is an inefficient approach, but it's perfectly suited to the way cubes work.

So by spending a little time thinking about the cube's data model, we can dramatically simplify our MDX, and improve our performance.

Frog-blog out

Tags: