0845 643 64 63

Monthly Archives: August 2010

Calculating Run Rate in DAX for PowerPivot

In my previous post I explained how to create a calculated MDX member that projects full year data (sales etc.) based on existing year to date data.

In this post I’ll be doing exactly the same but in DAX, the new expression language used to enhance PowerPivot data. As it’s the same desired outcome, I’m not going to repeat the background, you’ll have to look at my previous post for that.

The expressions below assume that you have a single table ‘Sales’ with a [Date] column and an [Internet Sales Amount] column.

Step 1 – What are our total sales so far this year?

We use the TOTALYTD function to work this out for us


  =TOTALYTD(SUM(Sales[Internet Sales Amount])
     ,Sales[Date], all(Sales))

The first parameter is the expression we want to calculate, i.e. the sum of internet sales.
The second parameter specifies the date we’re calculating up to.
The third parameter is one that catches a lot of people out. We have to tell DAX the context of the date. As it stands the expression can only see the data in the selected row, by specifing all(Sales) in the filter we expand the expression to be able to look at all of the data.

Step 2 – How far are we though the year?

It’s here where DAX really shows an improvement in the functions available over and above what’s available in MDX. There’s a YEARFRAC function which calculates how far we are through the year.


  =YEARFRAC(
     CONCATENATE("01/01/"
         ,year(Sales[Date]))
     ,Sales[Date]))

The first parameter is the start date, i.e. the 1st January. We have to build this using the year of the selected row to ensure we get the right year.
The second parameter is the date of the record we’re looking at, Sales[Date].

Step 3 – Project the value to the end of the year

We combine the two values by simply dividing the YTD figure by how far we are through the year


  =TOTALYTD(SUM(Sales[Internet Sales Amount])
     ,Sales[Date], all(Sales))
  / YEARFRAC(
      CONCATENATE("01/01/"
          , year(Sales[Date]))
     ,Sales[Date]))

CORRECTION
Thanks to Sandeep Nallabelli in the comments below or pointing out a bug in this, YearFrac should actually be calculated from 31st Dec instead of 1st Jan. Updated calculation should be:


 =TOTALYTD(SUM(Sales[Internet Sales Amount]) 
    ,Sales[Date], all(Sales)) 
  / YEARFRAC( 
      CONCATENATE("31/12/" 
          , year(Sales[Date])-1) 
      ,Sales[Date])) 

Also, it is worth adding the YearFrac calculation as a calculated column in your date table so that you can use it in measures.


This chart shows how the full year run rate is adjusted throughout the year as the cumulative sales to date grows. At the start of the year it’s quite volatile, but from February it settles down with an acurate projection.

And it really is as easy as that.

Frog-Blog Out

Calculate Run Rate (Full Year Projection) in MDX

This post explains how to create an MDX calculated member that will take a value from the cube and project it forward to the end of the year. This provides a simple mechanism for calculating what your expected total will be at year end, based upon current performance.

To do this more accurately you should use time series data mining models in SSAS and use DMX expressions to query the results, but this method is very simple and requires little effort, and will be pretty accurate so long as the data you’re modelling is fairly linear. Please note though that the more cyclical and seasonal your data is the less effective this will be.

The basic idea is that we take what we have done so far (i.e. year to date sales), look at how far through the year we are, and extrapolate the value of future months (or days/weeks/etc.) based upon values so far.

i.e. If we’re at March month end and we’ve sold 100 widgets so far this year, we’re 1/4 of the way through the year so we multiply 100 by 4 and get a prejected yearly total of 400.


This chart shows the concept of what we’re doing, and shows the full year prejections calculated in March (with 3 months of available data) and June (6 months of data). The projections obviously get more accurate the further you are through the year.

One of the points to note is that when creating a calculation like this, based upon a time dimension, the calculation should always work with any level of the dimension hierarchy selected. i.e. The user shouldn’t care whether they’re looking at a month, week, quarter or a day, the calculation should always work the same. To achieve this we simply use the .currentmember of the time hierarchy.

The following examples are based upon projecting the Internet Sales Amount measure found within the SQL Server 2008 Adventure Works DW sample cube.

Step 1 – What are our total sales so far this year?

MDX helpfully provides us with the YTD function which takes care of this for us.


  MEMBER [Measures].[YTD Sales] AS
    AGGREGATE(
      YTD([Date].[Calendar].CurrentMember)
      ,[Measures].[Internet Sales Amount])

This takes the current member of the Calendar hierarchy, and creates a set of all dates before it (this year) using YTD. It then aggregates (in this case sums) the Internet Sales Amount for all of these dates to calculate YTD Sales.

Step 2 – Which period are we in?

Here we’ll use the same YTD function to create a set of all dates so far this year, but in this case we’ll count the number of resulting members. Note that because we’re using the .CurrentMember of the hierarchy, it doesn’t matter if we’re looking at a date, week or month, the MDX will work. i.e. If we’re looking at 21 Jan it will return 21. If we’re looking at Q3 it will return 3, August will return 8 etc.


  MEMBER [Measures].[CurPeriod] AS
    COUNT(
      YTD([Date].[Calendar].CurrentMember)
      ,INCLUDEEMPTY)

Step 3 – How many periods are in the year?

If we coded this to only work with months then we could hard code this to 12 however we need to keep it generic to all levels of the hierarchy. So, we have to count all the cousins of the current time member [within this year]. Unfortunately there isn’t a Cousins function in MDX, and Siblings will only return other members within the same parent. i.e. siblings of May 4th would include May 1 through to May 31. To get around this we find the year of the current member by using the Ancestor function.


  ANCESTOR([Date].[Calendar].CurrentMember
  , [Date].[Calendar].[Calendar Year])

Then we find all of the descendants of the year, at the same level of the hierarchy (week/day/etc.) as the current member. We can then take a count as before.


  MEMBER [Measures].[TotalPeriods] AS
    COUNT(
      DESCENDANTS(
        ANCESTOR([Date].[Calendar].CurrentMember
          ,[Date].[Calendar].[Calendar Year])
        ,[Date].[Calendar].CurrentMember.level)
      ,INCLUDEEMPTY)

Step 4 – Calculate the Run Rate

Calculating the prejected yearly total (run rate) is then a simple calculation


  MEMBER [Measures].[Full Year Run Rate] AS
    [Measures].[YTD Sales]
    * ([Measures].[TotalPeriods]
       /[Measures].[CurPeriod])

You can then put the whole lot together and see the results…


WITH

  MEMBER [Measures].[YTD Sales] AS
    AGGREGATE(
      YTD([Date].[Calendar].CurrentMember)
      ,[Measures].[Internet Sales Amount])

  MEMBER [Measures].[CurPeriod] AS
    COUNT(
      YTD([Date].[Calendar].CurrentMember)
      ,INCLUDEEMPTY)

  MEMBER [Measures].[TotalPeriods] AS
    COUNT(
      DESCENDANTS(
        ANCESTOR([Date].[Calendar].CurrentMember
          ,[Date].[Calendar].[Calendar Year])
        ,[Date].[Calendar].CurrentMember.level)
      ,INCLUDEEMPTY)

  MEMBER [Measures].[Full Year Run Rate] AS
    [Measures].[YTD Sales]
    * ([Measures].[TotalPeriods]
       /[Measures].[CurPeriod])

SELECT
{
     [Measures].[Internet Sales Amount]
    ,[Measures].[YTD Sales]
    ,[Measures].[Full Year Run Rate]
    ,[Measures].[CurPeriod]
    ,[Measures].[TotalPeriods]
} ON 0,
{
    DESCENDANTS([Date].[Calendar].[CY 2003])
} ON 1
FROM [Direct Sales]

In my next blog I’ll be diong the same calculation in DAX for use with PowerPivot, stay tuned…

Frog-Blog Out

Seagate Momentus XT Hybrid Drive

I thought I’d take a break from writing posts about Business Intelligence and SQL Server, and instead share with you my elation at finding a laptop hard disk that quite simply makes the world a better place, the Seagate Momentus XT hybrid drive.

When I purchased my curent laptop (Dell XPS M1530 if you’re interested, with 4Gb RAM) I was presented with a choice between a fast 7200rpm 200Gb drive or a slower 5400rpm 320Gb drive. Due to the size of the databases I tend to work with I had to opt for the larger of the two, a Western Digital Caviar, taking the hit on performance.

I’ve been tempted for a while to upgrade the disk to a 7200rpm but have been secretly holding out (in vain) for solid state disks to increase in size and performance whilst decrease in price. £600 for a 256Gb SSD still renders them too expensive and too small to be an effective option for my needs. 512Gb drives are expected soon, but with a price tag of over £1000. No thanks.

Enter Seagate, with their Momentus XT hybrid drive which is now available in the UK. The 500Gb version (also available in 250Gb and 320Gb) is a standard laptop sized 2.5″ drive which combines 4Gb of super fast SLC NAND solid state storage alongside a 500Gb traditional 7200rpm drive. It also has 32Mb of drive-level cache. The drive monitors disk usage and automatically uses the SSD for the most commonly used files, without any help or drivers on the operating system. Thus you get the size/cost benefit of a standard drive but the performance boost of an SSD for your most accessed files. And all this for less than £100… How could I resist?!

After a weekend of reinstalling Windows 7 Ultimate (x64), Office 2010, SQL 2008 R2 and the usual plethora of other software, the results are quite simply staggering. My previous setup would let me login to Windows after 60 seconds, but I had to wait a total of 7.5 minutes until Outlook was open and usable. In the new setup I can login to Windows after 35 seconds, and Outlook is open and usable in under 1.5 minutes. 6 minutes saved per day just on bootup. That’s a whole 24 hours per year.

I have to place a caveat here, that there are a number of software differences between the two systems so it’s not by any means a scientific test. My old system was XP Pro x86 and the new one is Windows 7 Ultimate x64, I’ve changed SQL Server 2008 to 2008R2, and all the drivers/software are 64 bit instead of 32 bit. This will certainly make a difference on its own so the performance is not entirely down to the drive, however I have to assume that it takes the majority of the credit. Every detailed review that I’ve seen reports average performance as pretty much mid-way between a 7200rpm and a SSD disk.

The only downside is that I’ve now got to spend a few more weekends upgrading the other company laptops!

MDX Calculated Member Spanning Multiple Date Dimensions

It’s common in most cubes to have a number of different date dimensions, whether role playing, distinct, or a combination of both. Say for example, Entry Date, Posting Date and Accounting Period. There may also be numerous hierarchies in each date dimension, such as calendar and fiscal calendar, leading to a relatively complicated array of dates to worry about when calculating semi-additive measures.

If we create a date related calculation (i.e. total to date) how do we ensure that this calculation works across all date dimensions?

Lets assume we have a stock movement measure, where each record in the fact table is the change in stock (plus or minus). The current stock level is found by using a calculation totaling every record to date.

CREATE MEMBER CURRENTCUBE.[Measures].[Stock Level]
AS
  SUM({NULL:[Date].[Calendar].CurrentMember}
     , [Measures].[Stock Movement]
  );

[Note that {NULL:xxx} just creates a set of everything before the xxx member, i.e. everything to date]

This works just fine, if the user selects the [Date].[Calendar] hierarchy. What if the user selects the [Date].[Fiscal] hierarchy, or the [Period] dimension? Basically the calculation wont work, as the MDX expression is only aware of the [Date].[Calendar] hierarchy.

The simple solution is to use the Aggregate function over all of the dimensions that the calculation needs to be aware of:

CREATE MEMBER CURRENTCUBE.[Measures].[Stock Level]
AS
  AGGREGATE(
      {NULL:[Date].[Fiscal].CurrentMember}
       * {NULL:[Date].[Calendar].CurrentMember}
       * {NULL:[Period].[Period].CurrentMember}
    , [Measures].[Stock Movement]
  );

The calculation will then use whichever date or time hierarchy is selected. It will even cope if multiple dimensions are selected, say the calendar on 0 and the periods on 1, both axis will honor the aggregation as expected.

Frog-Blog out.

Data Warehousing: Kimball vs Inmon

Bill Inmon
Anyone involved in the Business Intelligence space has had their head in the sand if they are not aware of the long running, and more often than not misunderstood, debate between the two conceptual models of data warehouse design.

Bill Inmon has recently posted an article on www.b-eye-network.com discussing the matter, and to his credit, has tried to put forward a number of balanced pros and cons of each methodology.

I’ll state now that I’m a big advocate of a hybrid approach, taking elements from both Imnon and Kimball camps and selecting the right approach for each unique project depending on the requirements and purpose of the warehouse. I therefore appreciate both sides of the debate, and am not going to jump to the defence of either side. Having said that, most projects more often than not have a weighting towards Kimball due to the time pressures imposed by clients.

Bill nicely simmarises the key elements of each approach as:

The Kimball approach to database design and development is typified by the star schema design of databases. There are fact tables and dimension tables. In a complex environment, there are snowflake structures, which are merely extended versions of the star schema. In order to resolve differences of granularity between fact tables, conformed dimensions are used. Staging areas are occasionally used to capture raw data before the placement of the data into a Kimball style data mart.

The Inmon approach to data warehousing centers around a relational, non redundant, granular, integrated design of detailed data. From this base of data, data marts are spun off to different departments according to their individual analytical needs. In recent vintage, with DW 2.0, the Inmon approach calls for the recognition of the life cycle of data within the data warehouse, the inclusion and integration of unstructured data within the data warehouse, and the close integration of metadata into the data warehouse infrastructure.

Ralph Kimball

He then proceeds to present a brief comparitive assessment of the pros and cons of each. I don’t entirely agree with the black and white nature of the comparisons as most items are a shade of grey in both camps, but it certainly provides a good starting point for those that are starting out in the BI field and want to know more about whhat this debate is alll about.

Possibly the most interesting item (certainly from a business intelligence consultant’s perspective) is Bill’s renewed call for an open, public debate between Bill and Ralph – It gets my vote!

Thanks to Graham Bradfield at Computer People for pointing me towards this article in his BI newsletter.

View the full article here: Data Warehousing: Kimball vs Inmon

The Frog Blog

I'm Alex Whittles.

I specialise in designing and implementing SQL Server business intelligence solutions, and this is my blog! Just a collection of thoughts, techniques and ramblings on SQL Server, Cubes, Data Warehouses, MDX, DAX and whatever else comes to mind.

Data Platform MVP

Frog Blog Out
twitter
rssicon