How do you use MDX to find the number of items between two dates, when the start date and end date are different role playing dimensions?

I had an interesting question from Ricardo Santos on another frog-blog post this week (MDX Calculated Member spanning multiple date dimensions). I answered Ricardo in the post, but it was a good question which I thought deserved a full blog post on its own. So thanks to Ricardo for prompting this.

The Scenario

If you have a ‘Project’ fact table, which has two dates (start date and end date), how do you find out how many projects are active on any given date. In this scenario we would have three role playing date dimensions; Date, Start Date & End Date.

We want the user to be able to select a date (from the Date dimension) and then use that to filter the projects using the other two date dimensions.

This also applies to many other scenarios; accomodation occupancy, staffing levels, contracts, really anything with a start and end date.

The MDX answer

If there was just a single date dimension (i.e. a typical sales fact) then you could just filter it

1
  ([Measures].[Internet Sales Amount], [DATE].[Calendar].&[20080110])

But what we need to do is combine two different date dimensions. In SQL, to find the number of active projects on 1st April 2013 we’d just use

1
2
3
4
   SELECT COUNT(*)
   FROM Projects
   WHERE StartDate >= 20130401
     AND EndDate   <= 20130401

The equivalent in MDX is more complex as we have to filter a fact by two different dimensions. Starting with the [Start Date] dimension, we need to sum the project count for the projects with a start date on or before the required date:

1
    AGGREGATE({NULL:[START DATE].[Calendar].[20130401]}, [Measures].[Project COUNT])

We also need to then find the project count for all projects with an end date on or after the required date

1
    AGGREGATE({[END DATE].[Calendar].[20130401]:NULL}, [Measures].[Project COUNT])

We can then use crossjoin (*) to combine them together.

1
2
3
    AGGREGATE( {NULL:[START DATE].[Calendar].[20130301]}
             * {[END DATE].[Calendar].[20130301]:NULL}
             , [Measures].[Project COUNT])

Note that you could also use FILTER to achieve the same goal.

This will find the project count for a specific date, but we now need to make the date dynamic. We want the calculation to use a date selected by the user from a 3rd [Date] dimension, applying this date to the other two dimensions (start date & end date).

Enter the LINKMEMBER function. This is a great tool in your MDX arsenal, allowing you to identify a member of one dimension from a selection in a different role playing dimension. e.g.

1
     LINKMEMBER([DATE].[Calendar].[20130414], [START DATE].[Calendar])

The example above will return the following member of the Start Date dimension.

1
     [START DATE].[Calendar].[20130414]

When we integrate this into the previous MDX, we get:

1
2
3
4
5
6
    AGGREGATE( 
          {NULL:LINKMEMBER([DATE].[Calendar].CURRENTMEMBER
                     ,[START DATE].[Calendar])}
        * {LINKMEMBER([DATE].[Calendar].CURRENTMEMBER
                    , [END DATE].[Calendar]):NULL}
        , [Measures].[Project COUNT])

This will take the CURRENTMEMBER of the [Date].[Calendar] hierarchy, and map it to the equivalent members in the [Start Date].[Calendar] and [End Date].[Calendar] hierarchies, before then using the resulting members to filter the Project Count measure.

And there you have it.

But, that’s not strictly the right answer…

Data modelling

Whenever we have to write anything complicated in MDX, we need to ask ourselves the question of why?… Complicated MDX is all too often a symptom of poor data modelling. Usually caused when a data warehouse is designed with SQL in mind, not cubes/MDX. If you’re planning to use cubes then you can reap huge benefits from remodelling your data accordingly. This could be in your ETL and warehouse, or you can just remodel your data in the cube data source view. Lets have a look at an example.

The DSV for the project fact we were looking at earlier would be something like this (simplified)

1
2
3
4
5
6
7
    SELECT 
       , ProjectManagerKey
       , CustomerKey
       , StartDateKey
       , EndDateKey
       , 1 AS ProjectCount
    FROM FactProject

So the project fact represents a project. A cube fact table should however represent an event, not an item (items should be dimensions, not facts). So we need to think of a project as two seperate events; a start and an end. Take a look at the query below

1
2
3
4
5
6
7
8
9
10
11
12
13
    SELECT 
       , ProjectManagerKey
       , CustomerKey
       , StartDateKey AS DateKey
       , 1 AS ProjectCount
    FROM FactProject
	UNION ALL
    SELECT 
       , ProjectManagerKey
       , CustomerKey
       , EndDateKey AS DateKey
       , -1 AS ProjectCount
    FROM FactProject

Each project will therefore exist twice, once with a +1 on the project start date, and once with a -1 on the project end date. And we only have a single date dimension to worry about.

So how do we use this in MDX?

You can find out the number of active projects on any given date using

1
2
    AGGREGATE({NULL:[DATE].[Calendar].CURRENTMEMBER}
            , [Measures].[Project COUNT])

i.e. the sum of all projects started/finished prior to the selected date.

To find out the change in the number of projects in any given time period

1
2
    ([DATE].[Calendar].CURRENTMEMBER
    ,[Measures].[Project COUNT])

In the SQL world this is an inefficient approach, but it’s perfectly suited to the way cubes work.

So by spending a little time thinking about the cube’s data model, we can dramatically simplify our MDX, and improve our performance.

Frog-blog out

© Alex Whittles, Purple Frog Systems Ltd

29 Responses to “MDX Between Start Date and End Date”

  • james:

    interesting approach,what if there were say 3 project states e.g. started, pending, end, would you recommend UNIONing all 3 states?

    • Alex:

      I did a talk on exactly this at the last SQLBits – you can view the recording at http://sqlbits.com/Sessions/Event11/Data_Modeling_for_Analysis_Services_Cubes.

      The approach I usually use (obviously depending on the reporting requirements) is to turn the state changes into transactions in and out of a state. So you have a project state dimension, containing Start, Pending and End. When a project starts, +1 in the Start state. When is moves into pending you create a -1 in the Start state and a +1 in the pending state, etc. The cube can then provide very deailed analysis on the flow, bottlenecks, volume and change in each state over time. See the video for a more detailed explanation.

  • james:

    Thanks, I have managed to go through the recording and it is very informative. I particularly like the idea of thinking as events.

    Based on state, lets assume you have scenarios where a client can have say 4 states, Active, Under Review, Suspended, Withdrawn. And the client statuses are not always sequential e.g.
    Day 1: Active
    Day 3: Withdrawn
    Day 7: Active
    Day 9: Suspended
    Day 13: Under Review
    Day 23: Active

    How can we deal with this say when we need to get the status on say Day 8? I want to assume the approach above works as well? i.e. “backing out” the transactions?

    I am new to MDX and trying to “unlearn” my t-SQL thinking.

    • Alex:

      Hi James
      Yes this approach will cope with that, as entering any state will create a +1 against that state. You can move in and out of any statuses in any order and it will work.
      Because we use a Sum to Date, all of the +1/-1 will cancel each other out until the latest record prior to the day being reported.
      Alex

  • james:

    You are the man! I went through your video and it makes a lot of sense and I have managed to apply that. One question though, using your example for loan application, assuming on
    Day 1 Status = Application
    Day 1 Status = Review

    Using the model of “Backing out” the transaction, that means a SUM:
    Day 1 – Applications – 0
    Day 1 – Review – 1

    I am tempted to do a DATEADD to negate the Status to Day 2, but then that will not give the true picture.

    How would you go about this?

    Again thanks for all the help so far. Been a lifesaver!

    • Alex:

      This is correct behaviour – if the status moves from application to review on the same day, the loan can only be counted once or your totals will be incorrect. So it must be reported either as Application or Review, not both.

      This approach will show the end of day status against each date. So if an application changes numerous times in a day, you can still track that, but only the final state of each day will be shown in the aggregation. You can enhance this by adding a time dimension, which would then allow you to break down a day into hours/minutes/etc.

      Or if you want to track the number of loans that entered a status for each day, just filter out all the -1 records (add a dimension for this). Then you’ll see both statuses together on the same day.

  • Paul:

    Hi Alex,
    Thanks so much for sharing your knowledge and experience.
    I would like to ask you for a suggestions regarding my current challenge.
    Background: We have a highly normalized enterprise Data Warehouse. The sales fact table uses row versioning, similar as Kimball proposed (Compliance-Enabled) with a ValidFrom, ValidTo timestamps and a Current flag to easily identify the latest record. The only different with the Kimball version is we generate a new surrogate key for new versions of a fact (Kimball proposed the same surrogate key in combination with the validTo timestamp).
    The granularity of the table is the line order number. We have different status of every single article in the order, let´s say “open”, “cancelled” and “shipped”. We also store the net price, quantity, customer, etc.
    My challenge is to build a cube with a measure group with measures like the sum of the net price per article of the open orders for a given date or the sum of the quantities for a given article in a given date in the cancelled orders.
    I tried to use your ideas in order to design a data mart thinking in MDX and not in SQL but at the end I’m not sure if is better to transfer this VelidFrom and ValidTo columns to the fact table in the data mart and fill the analysis requirements using MDX instead.
    Have you faced a similar situation?
    Any comments or guidelines would be extremely appreciated.
    Thanks in advance,
    Paul

  • Paul:

    Hi Alex,
    Thanks so much for sharing your knowledge and experience.
    I would like to ask you for a suggestions regarding my current challenge.
    Background: We have a highly normalized enterprise Data Warehouse. The sales fact table use row versioning, similar as Kimball proposed (Compliance-Enabled) with a ValidFrom, ValidTo timestamps and a Current flag to easily identify the latest record. The only different with the Kimball version is we generate a new surrogate key for new versions of a fact (Kimball proposed the same surrogate key in combination with the validTo timestamp).
    The granularity of the table is the line order number. We have different status of every single article in the order, let´s say “open”, “cancelled” and “shipped”. We also store the net price, quantity, customer, etc.
    My challenge is to build a cube with a measure group with measures like the sum of the net price per article of the open orders for a given date or the sum of the quantities for a given article in a given date in the cancelled orders.
    I tried to use your ideas in order to design a data mart thinking in MDX and not in SQL but at the end I’m not sure if is better to transfer this VelidFrom and ValidTo columns to the fact table in the data mart and fill the analysis requirements using MDX instead.
    Have you faced a similar situation?
    Any comments or guidelines would be extremely appreciated.
    Thanks in advance,
    Paul

    • Alex:

      Hi Paul
      There are a lot of ‘it depends’ here I’m afraid.
      My preferred data model here is a transactional backout fact table. It creates a large fact table, but the cube becomes incredibly flexible.
      1) Store an order line in the fact.
      2) When a replacement comes in, create a negative ‘backout’ transaction for that order, with the original order details (but negative values) and the date of the replacement record.
      3) Insert the new replacement record.

      So two versions of a record result in 3 records in the fact.
      I did a talk on this at the last SQLBits – have a look at the video here
      http://sqlbits.com/Sessions/Event11/Data_Modeling_for_Analysis_Services_Cubes

      Hope this helps.
      Alex

  • Josh:

    Was this not the same technique discussed by Mosha Pasumansky back in 2007?
    http://sqlblog.com/blogs/mosha/archive/2007/06/01/counting-in-flight-events-in-mdx.aspx

    • Alex:

      Yes you’re right, and thanks for the link. Mosha goes into more detail of the MDX solution including adding Root() to fix a potential problem with the Date dimension, useful to know. What Mosha fails to point out though is the [in my opinion, better] alternative of changing the underlying data model. Not always an option, but it’s my preferred technique if it’s possible. Although as the founding guru of MDX, one would expect him to focus on the MDX solution :)

  • Brad:

    Hi Alex, I’m trying to apply your example MDX to return a count between start and end dates. It’s working as expected and with good performance for the Calendar Year and Month in my hierarchy, however the query just hangs if I try to expand the hierarchy to the date level. My date dimension has attribute relationships for the date – month – year.
    Am I missing anything here? I assume the 3rd role playing date dim, in my case ‘Date Between’ is just a floating dimension i.e it has no relationship to a measure group?

    Thanks

    AGGREGATE(
    {NULL:LINKMEMBER([Date Between].[Calendar Year – Month – Date].CURRENTMEMBER
    ,[Date Start].[Calendar Year – Month – Date])}
    * {LINKMEMBER([Date Between].[Calendar Year – Month – Date].CURRENTMEMBER
    , [Date End].[Calendar Year – Month – Date]):NULL}
    ,[Measures].[Count])

    • Alex:

      Hi Brad
      Paul’s right. The problem with any MDX solution is that it can perform poorly when calculating on a leaf level. The best option is to alter the data model to simplify the query.
      Although you don’t always have to rely on the ETL layer, you can often recreate the backout transaction approach just using a view under the cube. Worth investigating if you need to do a lot of these queries.
      Regards
      Alex

  • Paul Hernandez:

    Hi Brad,

    I also implementing a similar solution. It also works slowly for me when I drill down. I suppose the reason is, the funcition is evaluated for each member. Then if you have 5 year, the function is evaluated 5 times at this level, but 365 times for every year.

    We must think in a aggregation table per day or the “backout” solution proposed by Alex, which in my opinion is the best option if you have full control of the ETL process.

    Kind Regards

    Paul

  • stephen Billett:

    Hi Alex

    I have a different problem in that I am trying to get a measure that will look at YTD LY Sales for a period that looks at orders placed before a certain date, I tried below but it keeps bringing back the YTD value, any insight as to where I am going wrong?

    WITH
    MEMBER [TEST] AS
    SUM({NULL:[Sales Order Date].[Purchasing Model Calendar].[Date].&[2013-06-22T00:00:00]}*
    (EXISTS([Dates].[Purchasing Model Calendar].[Date].MEMBERS,[Dates].[Current Day Europe].&[Y]),[Time Calculations].[YTD LY]),[Measures].[Extended Price Local – S])

    SELECT NON EMPTY
    {[TEST]}
    ON COLUMNS,
    NON EMPTY
    {[Customers].[Sales Continent].CHILDREN*
    [Customers].[Sales Country Designation].CHILDREN} ON ROWS FROM
    (SELECT({[Customers].[Sales Continent].&[Europe]}) ON COLUMNS
    FROM [Buyer])

    • Alex:

      I’d change the SUM() set to use currentmember of the Sales Order Date dimension, and move the actual date selection to the WHERE clause.
      If that doesn’t work then you may have a problem in the MDX scope script behind the [Time Calculations].[YTD LY] attribute.
      Regards
      Alex

  • heykens:

    Hi Alex, I am new to MDX and trying to get customer sales for given date range (for example 2001.07.24 – 2007-08-01) for specific sales type.New customer means they have no transaction for previous two months.İ wrote something but didnt work.My query is below,

    SELECT
    {[Measures].[TranCount],[Measures].[Amount]} on columns
    ,
    EXCEPT(
    NONEMPTY
    (
    [Customer].[CustomerNumber].[CustomerNumber].ALLMEMBERS ,
    ([Measures].[Fact Islem Count],[Date].[Dt].&[2001-07-24T00:00:00] : [Date].[Dt].&[2008-08-01T00:00:00])
    )
    ,
    NONEMPTY
    (
    [Customer].[CustomerNumber].[CustomerNumber].ALLMEMBERS ,
    OpeningPeriod ([Date].[Hierarchy].[Dt], [Date].[Hierarchy].[Dt].&[2007-07-24T00:00:00].Parent.Lag(2)) :
    ClosingPeriod ([Date].[Hierarchy].[Dt], [Date].[Hierarchy].[Dt].&[2007-07-24T00:00:00].Parent.Lag(1))
    )
    )

    ON rows

    FROM [cubeSales]
    where ([Tran].[TranType].&[C])

  • RollTime:

    Great information. Thanks for posting. This saved me hours.

  • ktata:

    I used the mdx query to search between start date and end date but it is giving incorrect result
    I build test fact with
    startdate end date charges

  • Tata:

    Thanks for the information.
    I just started learning BI, I have scenario where
    I have a weekly run for all patient accounts and to see if hospital billed it or not
    and goal is to build a cube and see week over week how we are progressing.

    So I have fact table with
    Patinetaccountkey
    IsBilled
    Charges
    StarDateKey
    EndDateKey

    I used the mdx query to search between start date and end date but it is giving incorrect result
    I build test fact :
    startdate end date charges
    1/5/2014 1/11/2014 $10
    1/12/2014 1/14/2014 $20
    1/15/2014 1/18/014 $30

    MDX in my cube Calculated Member(CALC_charges)
    SUM(
    {NULL:LINKMEMBER([dim Date].[WK].currentmember, [startdatekey].[WK])}
    * {LINKMEMBER([dim Date].[WK].currentmember, [enddatekey].[WK]):NULL}
    , [Measures].[charges]
    )

    Result:
    Week Calc_Charges
    ———————————-
    1 $10
    2 $50
    3 $10
    4 $10

    why I am seeing Week 3 and 4 with values even though there is no associated fact.

    Please help

    Thanks,
    Tata

    • Alex:

      Hi Tata
      I’ve just replicated your data and code, and it works correctly. Therefore it’s something in your implementation that is wrong.

      What I’ve got is:
      CREATE TABLE [dbo].[test](
      [TestFactKey] [int] IDENTITY(1,1) NOT NULL,
      [StartDateKey] [int] NOT NULL,
      [EndDateKey] [int] NOT NULL,
      [Charges] [int] NOT NULL
      )
      GO
      INSERT INTO dbo.test
      VALUES (20140105, 20140111, 10)
      ,(20140112, 20140114, 20)
      ,(20140115, 20140118, 30)
      GO

      Then a standard role playing date dimension (integer DateKey in the form YYYYMMDD), added three times (Date, End Date & Start Date). Date has no relationship to the test fact. The other two dates are linked to the test fact with a regular relationship.

      Then the calculation is
      CREATE MEMBER CURRENTCUBE.[Measures].[CALC_charges]
      AS
      AGGREGATE(
      {NULL:LINKMEMBER([Date].[Week].CURRENTMEMBER
      ,[Start Date].[Week])}
      * {LINKMEMBER([Date].[Week].CURRENTMEMBER
      , [End Date].[Week]):NULL}
      , [Measures].[Charges]);

      Then, the following query
      SELECT [Measures].[CALC_charges] ON 0,
      {[Date].[Week].[1]:[Date].[Week].[3]} ON 1
      FROM [Test Cube]

      Gives the results:
      1 10
      2 30
      3 50

      As expected. Can you double check your implementation to make sure it follows the above structure?

      Also, please also take note of the blog post which talks about remodelling your data. You may want to consider this in order to simplify your MDX…

      Regards
      Alex

  • Tata:

    Alex,
    Thanks for your reply.
    I created everything from scratch and it worked.
    And regarding remodeling data:
    As i said before I am trying to build a cube on history to track the changes week by week.
    In my scenario
    1. every patient will be included in each week run
    2. A patient Billed status may or may not change from week to week
    for example:
    for date 8/2/14 run for patient 1 billed is 0 and charges $100
    for date 8/2/14 for patient 2 billed is 1 and charges $200

    for date 8/9/14 run for patient 1 billed is 1 and charges same $100
    for date 8/9/14 run for patient 2 billed is 1 and charges same $200

    for date 8/16/14 run for patient 1 billed is 1 and charges same $100
    for date 8/16/14 run for patient 2 billed is 0 and charges same $200

    for each week i need metrics total patients,total billed and total billed ratio and total charges ratio
    may be i need to look by week,Month, Qtr and year

    fro week 8/2/14:
    Total patients : 2
    Billed: 1
    Billed Charges: $200

    for month August:
    Total patients: 2
    Billed: 1
    Billed Charges: $100

    I am trying with start date and end date with complicated MDX,
    do you have any suggestion.

    I appreciate all your help.

    Thanks,
    Tata

  • Sivleam:

    Thank you very much for this article. I also watched the video. Thank you very much.
    I remodeled my cube and it’s very easy to calculate active transactions or project.

    Please, I have a question.
    In you MDX statement, after remodeling, you specify a date hiearchy from which the user must analyze the KPI. Calendar hierachy below.
    AGGREGATE({NULL:[DATE].[Calendar].CURRENTMEMBER} , [Measures].[Project COUNT])

    This oblige us to create one KPI for Week, another one for Month, Quarter, Semester and Year. I think it’s very very complicate.

    Is it possible to write, only one MDX statement that won’t depend on the hierachy that the user has selected ? I mean, if the user select the [Week] hierachy or the [Month] hierachy then the MDX statement always return the good result.

    Thanks.
    Sivleam.

    • Alex:

      Hi Sivleam
      I wrote another blog on this some time ago. Have a look at http://www.purplefrogsystems.com/blog/2010/08/mdx-calculated-member-spanning-multiple-date-dimensions/ and see if it helps.
      Regards
      Alex

  • Sivleam:

    Thank you …

  • KARI:

    Hi Alex ,

    I need to filter two different dates by start date and end date.

    SQL statement looks like
    select a, b, c , d
    from table1
    where Rdate >= @startdate and Ddate <= @enddate

    Can you let me know how I can do the above statement in MDX.

    I tried to put it under where clause in my MDX but it didn't work..

    It throws an error
    "Greater Than function expects a member expression for the 1 argument"

    Thanks in Advance !!
    KARI

    • Alex:

      Hi Kari
      Like always with MDX, there are many ways of approaching this. However we shouldn’t think of this in terms of how we’d write a SQL statement, it’s important to get into an MDX mindset.
      One option would be to define the subcube using sub-selects in the FROM statement:
      SELECT … FROM (SELECT {[RDate].[Date].[20141029]:NULL} ON 0 FROM (SELECT {NULL:[DDate].[Date].[20141029]} ON 0 FROM [MyCube]))

      So each {} set is the set of all required dates in that particular dimension, null being the start or end of the dimension (i.e. the earliest or latest date, depending on whether it’s null: or :null).
      Hope this makes sense!
      Alex

  • KARI:

    Thank Alex!
    As I mentioned I am using parameters here. So would you be able to tell me how I put a parameter values to the same instead of the direct dates.
    I am still working on the same issue.. Seems like nothing is working I hope your solution works :)

    Thanks a bunch!
    Kari

    • Alex:

      Hi Kari
      There are two separate things here, the MDX functionality and then the parameterisation.
      First I’d recommend getting the MDX working how you want with a hard coded date. Only when that is working should you then introduce the parameters.
      When you get that far, just use the strtomember MDX function (and string concatenation if required) to convert the parameter value into a dimension member.
      Regards
      Alex

Leave a Reply

The Frog Blog

I'm Alex Whittles.

I specialise in designing and implementing SQL Server business intelligence solutions, and this is my blog! Just a collection of thoughts, techniques and ramblings on SQL Server, Cubes, Data Warehouses, MDX, DAX and whatever else comes to mind.

I'm Organising SQL Relay 2014

Submit a session for SQLBits

Frog Blog Out
twitter
rssicon