How do you use MDX to find the number of items between two dates, when the start date and end date are different role playing dimensions?

I had an interesting question from Ricardo Santos on another frog-blog post this week (MDX Calculated Member spanning multiple date dimensions). I answered Ricardo in the post, but it was a good question which I thought deserved a full blog post on its own. So thanks to Ricardo for prompting this.

The Scenario

If you have a ‘Project’ fact table, which has two dates (start date and end date), how do you find out how many projects are active on any given date. In this scenario we would have three role playing date dimensions; Date, Start Date & End Date.

We want the user to be able to select a date (from the Date dimension) and then use that to filter the projects using the other two date dimensions.

This also applies to many other scenarios; accomodation occupancy, staffing levels, contracts, really anything with a start and end date.

The MDX answer

If there was just a single date dimension (i.e. a typical sales fact) then you could just filter it

1
  ([Measures].[Internet Sales Amount], [DATE].[Calendar].&[20080110])

But what we need to do is combine two different date dimensions. In SQL, to find the number of active projects on 1st April 2013 we’d just use

1
2
3
4
   SELECT COUNT(*)
   FROM Projects
   WHERE StartDate >= 20130401
     AND EndDate   <= 20130401

The equivalent in MDX is more complex as we have to filter a fact by two different dimensions. Starting with the [Start Date] dimension, we need to sum the project count for the projects with a start date on or before the required date:

1
    AGGREGATE({NULL:[START DATE].[Calendar].[20130401]}, [Measures].[Project COUNT])

We also need to then find the project count for all projects with an end date on or after the required date

1
    AGGREGATE({[END DATE].[Calendar].[20130401]:NULL}, [Measures].[Project COUNT])

We can then use crossjoin (*) to combine them together.

1
2
3
    AGGREGATE( {NULL:[START DATE].[Calendar].[20130301]}
             * {[END DATE].[Calendar].[20130301]:NULL}
             , [Measures].[Project COUNT])

Note that you could also use FILTER to achieve the same goal.

This will find the project count for a specific date, but we now need to make the date dynamic. We want the calculation to use a date selected by the user from a 3rd [Date] dimension, applying this date to the other two dimensions (start date & end date).

Enter the LINKMEMBER function. This is a great tool in your MDX arsenal, allowing you to identify a member of one dimension from a selection in a different role playing dimension. e.g.

1
     LINKMEMBER([DATE].[Calendar].[20130414], [START DATE].[Calendar])

The example above will return the following member of the Start Date dimension.

1
     [START DATE].[Calendar].[20130414]

When we integrate this into the previous MDX, we get:

1
2
3
4
5
6
    AGGREGATE( 
          {NULL:LINKMEMBER([DATE].[Calendar].CURRENTMEMBER
                     ,[START DATE].[Calendar])}
        * {LINKMEMBER([DATE].[Calendar].CURRENTMEMBER
                    , [END DATE].[Calendar]):NULL}
        , [Measures].[Project COUNT])

This will take the CURRENTMEMBER of the [Date].[Calendar] hierarchy, and map it to the equivalent members in the [Start Date].[Calendar] and [End Date].[Calendar] hierarchies, before then using the resulting members to filter the Project Count measure.

And there you have it.

But, that’s not strictly the right answer…

Data modelling

Whenever we have to write anything complicated in MDX, we need to ask ourselves the question of why?… Complicated MDX is all too often a symptom of poor data modelling. Usually caused when a data warehouse is designed with SQL in mind, not cubes/MDX. If you’re planning to use cubes then you can reap huge benefits from remodelling your data accordingly. This could be in your ETL and warehouse, or you can just remodel your data in the cube data source view. Lets have a look at an example.

The DSV for the project fact we were looking at earlier would be something like this (simplified)

1
2
3
4
5
6
7
    SELECT 
       , ProjectManagerKey
       , CustomerKey
       , StartDateKey
       , EndDateKey
       , 1 AS ProjectCount
    FROM FactProject

So the project fact represents a project. A cube fact table should however represent an event, not an item (items should be dimensions, not facts). So we need to think of a project as two seperate events; a start and an end. Take a look at the query below

1
2
3
4
5
6
7
8
9
10
11
12
13
    SELECT 
       , ProjectManagerKey
       , CustomerKey
       , StartDateKey AS DateKey
       , 1 AS ProjectCount
    FROM FactProject
	UNION ALL
    SELECT 
       , ProjectManagerKey
       , CustomerKey
       , EndDateKey AS DateKey
       , -1 AS ProjectCount
    FROM FactProject

Each project will therefore exist twice, once with a +1 on the project start date, and once with a -1 on the project end date. And we only have a single date dimension to worry about.

So how do we use this in MDX?

You can find out the number of active projects on any given date using

1
2
    AGGREGATE({NULL:[DATE].[Calendar].CURRENTMEMBER}
            , [Measures].[Project COUNT])

i.e. the sum of all projects started/finished prior to the selected date.

To find out the change in the number of projects in any given time period

1
2
    ([DATE].[Calendar].CURRENTMEMBER
    ,[Measures].[Project COUNT])

In the SQL world this is an inefficient approach, but it’s perfectly suited to the way cubes work.

So by spending a little time thinking about the cube’s data model, we can dramatically simplify our MDX, and improve our performance.

Frog-blog out

© Alex Whittles, Purple Frog Systems Ltd

18 Responses to “MDX Between Start Date and End Date”

  • james:

    interesting approach,what if there were say 3 project states e.g. started, pending, end, would you recommend UNIONing all 3 states?

    • Alex:

      I did a talk on exactly this at the last SQLBits – you can view the recording at http://sqlbits.com/Sessions/Event11/Data_Modeling_for_Analysis_Services_Cubes.

      The approach I usually use (obviously depending on the reporting requirements) is to turn the state changes into transactions in and out of a state. So you have a project state dimension, containing Start, Pending and End. When a project starts, +1 in the Start state. When is moves into pending you create a -1 in the Start state and a +1 in the pending state, etc. The cube can then provide very deailed analysis on the flow, bottlenecks, volume and change in each state over time. See the video for a more detailed explanation.

  • james:

    Thanks, I have managed to go through the recording and it is very informative. I particularly like the idea of thinking as events.

    Based on state, lets assume you have scenarios where a client can have say 4 states, Active, Under Review, Suspended, Withdrawn. And the client statuses are not always sequential e.g.
    Day 1: Active
    Day 3: Withdrawn
    Day 7: Active
    Day 9: Suspended
    Day 13: Under Review
    Day 23: Active

    How can we deal with this say when we need to get the status on say Day 8? I want to assume the approach above works as well? i.e. “backing out” the transactions?

    I am new to MDX and trying to “unlearn” my t-SQL thinking.

    • Alex:

      Hi James
      Yes this approach will cope with that, as entering any state will create a +1 against that state. You can move in and out of any statuses in any order and it will work.
      Because we use a Sum to Date, all of the +1/-1 will cancel each other out until the latest record prior to the day being reported.
      Alex

  • james:

    You are the man! I went through your video and it makes a lot of sense and I have managed to apply that. One question though, using your example for loan application, assuming on
    Day 1 Status = Application
    Day 1 Status = Review

    Using the model of “Backing out” the transaction, that means a SUM:
    Day 1 – Applications – 0
    Day 1 – Review – 1

    I am tempted to do a DATEADD to negate the Status to Day 2, but then that will not give the true picture.

    How would you go about this?

    Again thanks for all the help so far. Been a lifesaver!

    • Alex:

      This is correct behaviour – if the status moves from application to review on the same day, the loan can only be counted once or your totals will be incorrect. So it must be reported either as Application or Review, not both.

      This approach will show the end of day status against each date. So if an application changes numerous times in a day, you can still track that, but only the final state of each day will be shown in the aggregation. You can enhance this by adding a time dimension, which would then allow you to break down a day into hours/minutes/etc.

      Or if you want to track the number of loans that entered a status for each day, just filter out all the -1 records (add a dimension for this). Then you’ll see both statuses together on the same day.

  • Paul:

    Hi Alex,
    Thanks so much for sharing your knowledge and experience.
    I would like to ask you for a suggestions regarding my current challenge.
    Background: We have a highly normalized enterprise Data Warehouse. The sales fact table uses row versioning, similar as Kimball proposed (Compliance-Enabled) with a ValidFrom, ValidTo timestamps and a Current flag to easily identify the latest record. The only different with the Kimball version is we generate a new surrogate key for new versions of a fact (Kimball proposed the same surrogate key in combination with the validTo timestamp).
    The granularity of the table is the line order number. We have different status of every single article in the order, let´s say “open”, “cancelled” and “shipped”. We also store the net price, quantity, customer, etc.
    My challenge is to build a cube with a measure group with measures like the sum of the net price per article of the open orders for a given date or the sum of the quantities for a given article in a given date in the cancelled orders.
    I tried to use your ideas in order to design a data mart thinking in MDX and not in SQL but at the end I’m not sure if is better to transfer this VelidFrom and ValidTo columns to the fact table in the data mart and fill the analysis requirements using MDX instead.
    Have you faced a similar situation?
    Any comments or guidelines would be extremely appreciated.
    Thanks in advance,
    Paul

  • Paul:

    Hi Alex,
    Thanks so much for sharing your knowledge and experience.
    I would like to ask you for a suggestions regarding my current challenge.
    Background: We have a highly normalized enterprise Data Warehouse. The sales fact table use row versioning, similar as Kimball proposed (Compliance-Enabled) with a ValidFrom, ValidTo timestamps and a Current flag to easily identify the latest record. The only different with the Kimball version is we generate a new surrogate key for new versions of a fact (Kimball proposed the same surrogate key in combination with the validTo timestamp).
    The granularity of the table is the line order number. We have different status of every single article in the order, let´s say “open”, “cancelled” and “shipped”. We also store the net price, quantity, customer, etc.
    My challenge is to build a cube with a measure group with measures like the sum of the net price per article of the open orders for a given date or the sum of the quantities for a given article in a given date in the cancelled orders.
    I tried to use your ideas in order to design a data mart thinking in MDX and not in SQL but at the end I’m not sure if is better to transfer this VelidFrom and ValidTo columns to the fact table in the data mart and fill the analysis requirements using MDX instead.
    Have you faced a similar situation?
    Any comments or guidelines would be extremely appreciated.
    Thanks in advance,
    Paul

    • Alex:

      Hi Paul
      There are a lot of ‘it depends’ here I’m afraid.
      My preferred data model here is a transactional backout fact table. It creates a large fact table, but the cube becomes incredibly flexible.
      1) Store an order line in the fact.
      2) When a replacement comes in, create a negative ‘backout’ transaction for that order, with the original order details (but negative values) and the date of the replacement record.
      3) Insert the new replacement record.

      So two versions of a record result in 3 records in the fact.
      I did a talk on this at the last SQLBits – have a look at the video here
      http://sqlbits.com/Sessions/Event11/Data_Modeling_for_Analysis_Services_Cubes

      Hope this helps.
      Alex

  • Josh:

    Was this not the same technique discussed by Mosha Pasumansky back in 2007?
    http://sqlblog.com/blogs/mosha/archive/2007/06/01/counting-in-flight-events-in-mdx.aspx

    • Alex:

      Yes you’re right, and thanks for the link. Mosha goes into more detail of the MDX solution including adding Root() to fix a potential problem with the Date dimension, useful to know. What Mosha fails to point out though is the [in my opinion, better] alternative of changing the underlying data model. Not always an option, but it’s my preferred technique if it’s possible. Although as the founding guru of MDX, one would expect him to focus on the MDX solution :)

  • Brad:

    Hi Alex, I’m trying to apply your example MDX to return a count between start and end dates. It’s working as expected and with good performance for the Calendar Year and Month in my hierarchy, however the query just hangs if I try to expand the hierarchy to the date level. My date dimension has attribute relationships for the date – month – year.
    Am I missing anything here? I assume the 3rd role playing date dim, in my case ‘Date Between’ is just a floating dimension i.e it has no relationship to a measure group?

    Thanks

    AGGREGATE(
    {NULL:LINKMEMBER([Date Between].[Calendar Year - Month - Date].CURRENTMEMBER
    ,[Date Start].[Calendar Year - Month - Date])}
    * {LINKMEMBER([Date Between].[Calendar Year - Month - Date].CURRENTMEMBER
    , [Date End].[Calendar Year - Month - Date]):NULL}
    ,[Measures].[Count])

    • Alex:

      Hi Brad
      Paul’s right. The problem with any MDX solution is that it can perform poorly when calculating on a leaf level. The best option is to alter the data model to simplify the query.
      Although you don’t always have to rely on the ETL layer, you can often recreate the backout transaction approach just using a view under the cube. Worth investigating if you need to do a lot of these queries.
      Regards
      Alex

  • Paul Hernandez:

    Hi Brad,

    I also implementing a similar solution. It also works slowly for me when I drill down. I suppose the reason is, the funcition is evaluated for each member. Then if you have 5 year, the function is evaluated 5 times at this level, but 365 times for every year.

    We must think in a aggregation table per day or the “backout” solution proposed by Alex, which in my opinion is the best option if you have full control of the ETL process.

    Kind Regards

    Paul

  • stephen Billett:

    Hi Alex

    I have a different problem in that I am trying to get a measure that will look at YTD LY Sales for a period that looks at orders placed before a certain date, I tried below but it keeps bringing back the YTD value, any insight as to where I am going wrong?

    WITH
    MEMBER [TEST] AS
    SUM({NULL:[Sales Order Date].[Purchasing Model Calendar].[Date].&[2013-06-22T00:00:00]}*
    (EXISTS([Dates].[Purchasing Model Calendar].[Date].MEMBERS,[Dates].[Current Day Europe].&[Y]),[Time Calculations].[YTD LY]),[Measures].[Extended Price Local - S])

    SELECT NON EMPTY
    {[TEST]}
    ON COLUMNS,
    NON EMPTY
    {[Customers].[Sales Continent].CHILDREN*
    [Customers].[Sales Country Designation].CHILDREN} ON ROWS FROM
    (SELECT({[Customers].[Sales Continent].&[Europe]}) ON COLUMNS
    FROM [Buyer])

    • Alex:

      I’d change the SUM() set to use currentmember of the Sales Order Date dimension, and move the actual date selection to the WHERE clause.
      If that doesn’t work then you may have a problem in the MDX scope script behind the [Time Calculations].[YTD LY] attribute.
      Regards
      Alex

  • heykens:

    Hi Alex, I am new to MDX and trying to get customer sales for given date range (for example 2001.07.24 – 2007-08-01) for specific sales type.New customer means they have no transaction for previous two months.İ wrote something but didnt work.My query is below,

    SELECT
    {[Measures].[TranCount],[Measures].[Amount]} on columns
    ,
    EXCEPT(
    NONEMPTY
    (
    [Customer].[CustomerNumber].[CustomerNumber].ALLMEMBERS ,
    ([Measures].[Fact Islem Count],[Date].[Dt].&[2001-07-24T00:00:00] : [Date].[Dt].&[2008-08-01T00:00:00])
    )
    ,
    NONEMPTY
    (
    [Customer].[CustomerNumber].[CustomerNumber].ALLMEMBERS ,
    OpeningPeriod ([Date].[Hierarchy].[Dt], [Date].[Hierarchy].[Dt].&[2007-07-24T00:00:00].Parent.Lag(2)) :
    ClosingPeriod ([Date].[Hierarchy].[Dt], [Date].[Hierarchy].[Dt].&[2007-07-24T00:00:00].Parent.Lag(1))
    )
    )

    ON rows

    FROM [cubeSales]
    where ([Tran].[TranType].&[C])

  • RollTime:

    Great information. Thanks for posting. This saved me hours.

Leave a Reply

The Frog Blog

I'm Alex Whittles.

I specialise in designing and implementing SQL Server business intelligence solutions, and this is my blog! Just a collection of thoughts, techniques and ramblings on SQL Server, Cubes, Data Warehouses, MDX, DAX and whatever else comes to mind.

Organising SQLRelay

Submit a session for SQLBits

Frog Blog Out
twitter
rssicon