0845 643 64 63

Analysis Services

OLAP Cube Documentation in SSRS part 1

Being a business intelligence consultant, I like to spend my time designing data warehouses, ETL scripts and OLAP cubes. An unfortunate consequence of this is having to write the documentation that goes with the fun techy work. So it got me thnking, is there a slightly more fun techy way of automating the documentation of OLAP cubes…

There are some good tools out there such as BI Documenter, but I wanted a way of having more control over the output, and also automating it further so that you don’t have to run an overnight build of the documentation.

I found a great article by Vincent Rainardi describing some DMVs (Dynamic Management Views) available in SQL 2008 which got me thinking, why not just build a number of SSRS reports calling these DMVs, which would then dynamically create the cube structure documentation in real time whenever the report rendered..

This post is the first in a 3 part set which will demonstrate how you can use these DMVs to automate the SSAS cube documentation and user guide.

UPDATE: I presented a 1 hour session at SQLBits 8 covering all of this work, you can watch the video here.

There’s a full list of DMVs available in SQL 2008 R2 on the msdn site.

The primary DMVs that are of interest are:

DMV Description
MDSCHEMA_CUBES Lists the cubes in an SSAS database
MDSCHEMA_MEASUREGROUPS Lists measure groups
MDSCHEMA_DIMENSIONS Lists dimensions
MDSCHEMA_LEVELS Dimension attributes
MDSCHEMA_MEASUREGROUP_DIMENSIONS Enumerates dimensions of measure groups
MDSCHEMA_MEASURES Lists measures

When querying DMVs we can use SQL style SELECT statements, but executed against the cube in a DMX window.

SELECT *
FROM $SYSTEM.MDSCHEMA_CUBES

This returns a dataset like any other SQL query.

We can even enhance it with DISTINCT and WHERE clauses, although they are more restricted than basic SQL. One of the main limitations is the lack of a JOIN operator. A number of the queries that I’ll perform below need to use JOIN, so to get around this I wrap up each query in an SQL OPENROWSET command, executed against a SQL database with a linked server to the cube. This enables me to perform JOINs using queries such as

SELECT *
FROM OPENQUERY(CubeLinkedServer,
   'SELECT *
    FROM $SYSTEM.MDSCHEMA_MEASUREGROUP_DIMENSIONS') mgd
INNER JOIN OPENQUERY(CubeLinkedServer,
   'SELECT *
    FROM $SYSTEM.MDSCHEMA_MEASUREGROUPS') mg
ON mgd.XXX = mg.XXX


etc.

I’m therefore going to create a number of stored procs to wrap up this functionality, the SSRS reports can then just call the procs.

Within BIDS, every item (cube, measure group, measure, dimension, attribute, hierarchy, KPI, etc.) has a description in the properties pane which is a multi-line free text property. These are exposed by the DMVs, so I’m going to make use of them and bring them out in the reports. This allows you to create the descriptions within BIDS as you’re developing the cube, meaning they’re version controlled and always in sync with the code.

I should also point out that I’m using SQL Server 2008 R2. All of the queries below will work with SQL 2008, but I want to use the spatial report functionality of SSRS 2008 R2 to generate dynamic star schema visualisations, which is only supported in R2.

In this post I’ll script out the stored procedures used as the basis of the documentation. In my next post I’ll put these into SSRS reports.

Lets get started.

Firstly we need to create our linked server. This script will create a linked server called CubeLinkedServer pointing to the Adventure Works DW 2008R2 OLAP database on the local server.

EXEC master.dbo.sp_addlinkedserver
   @server = N'CubeLinkedServer',
   @srvproduct=N'MSOLAP',
   @provider=N'MSOLAP',
   @datasrc=N'(local)',
   @catalog=N'Adventure Works DW 2008R2'


You’ll have to set up the security according to your requirements. So now lets start creating the source procs.

The first proc lists all of the cubes. The MDSCHEMA_CUBES DMV returns not only cubes, but also dimensions, I’m filtering it to only return cubes by specifying CUBE_SOURCE=1.

CREATE PROCEDURE [dbo].[upCubeDocCubes]
  (@Catalog       VARCHAR(255) = NULL
  )
AS
  SELECT *
  FROM OPENQUERY(CubeLinkedServer,
    'SELECT *
     FROM $SYSTEM.MDSCHEMA_CUBES
     WHERE CUBE_SOURCE = 1')
  WHERE CAST([CATALOG_NAME] AS VARCHAR(255)) = @Catalog
    OR @Catalog IS NULL
GO


The next proc returns all measure groups found within a specified cube.

CREATE PROCEDURE [dbo].[upCubeDocMeasureGroupsInCube]
  (@Catalog       VARCHAR(255)
  ,@Cube          VARCHAR(255)
  )
AS
  SELECT *
  FROM OPENQUERY(CubeLinkedServer,
    'SELECT *
     FROM $SYSTEM.MDSCHEMA_MEASUREGROUPS ')
  WHERE CAST([CATALOG_NAME] AS VARCHAR(255)) = @Catalog
    AND CAST([CUBE_NAME] AS VARCHAR(255))    = @Cube
GO


This next proc returns a list of measures within a specified measure group.

CREATE PROCEDURE [dbo].[upCubeDocMeasuresInMeasureGroup]
  (@Catalog       VARCHAR(255)
  ,@Cube          VARCHAR(255)
  ,@MeasureGroup  VARCHAR(255)
  )
AS
SELECT * FROM OPENQUERY(CubeLinkedServer,
  'SELECT *
   FROM $SYSTEM.MDSCHEMA_MEASURES
     WHERE [MEASURE_IS_VISIBLE]')
   WHERE CAST([CATALOG_NAME] AS VARCHAR(255))      = @Catalog
     AND CAST([CUBE_NAME] AS VARCHAR(255))         = @Cube
     AND CAST([MEASUREGROUP_NAME] AS VARCHAR(255)) = @MeasureGroup
GO


The following proc queries all dimensions available within a specified cube. I’m filtering using the DIMENSION_IS_VISIBLE column to only show visible dimensions.

CREATE PROCEDURE [dbo].[upCubeDocDimensionsInCube]
  (@Catalog       VARCHAR(255)
  ,@Cube          VARCHAR(255)
  )
AS
SELECT * FROM OPENQUERY(CubeLinkedServer,
  'SELECT *
   FROM $SYSTEM.MDSCHEMA_DIMENSIONS
     WHERE [DIMENSION_IS_VISIBLE]')
   WHERE CAST([CATALOG_NAME] AS VARCHAR(255)) = @Catalog
     AND CAST([CUBE_NAME] AS VARCHAR(255))    = @Cube
GO


Then we can query all available attributes within a dimension. This DMV returns a bitmask field (LEVEL_ORIGIN) which defines whether the attribute is a key, attribute or hierarchy. I’m using bitwise AND (&) to split this into three seperate fields for ease of use. I’m also filtering out invisible attributes, as well as those with a level of 0. Level 0 is the [All] member of any attribute, which we can ignore for this purpose.

CREATE PROCEDURE [dbo].[upCubeDocAttributesInDimension]
  (@Catalog       VARCHAR(255)
  ,@Cube          VARCHAR(255)
  ,@Dimension  VARCHAR(255)
  )
AS
  SELECT *
    , CASE WHEN CAST([LEVEL_ORIGIN] AS INT) & 1 = 1
        THEN 1 ELSE 0 END AS IsHierarchy
    , CASE WHEN CAST([LEVEL_ORIGIN] AS INT) & 2 = 2
        THEN 1 ELSE 0 END AS IsAttribute
    , CASE WHEN CAST([LEVEL_ORIGIN] AS INT) & 4 = 4
        THEN 1 ELSE 0 END AS IsKey
  FROM OPENQUERY(CubeLinkedServer,
    'SELECT *
     FROM $SYSTEM.MDSCHEMA_LEVELS
     WHERE [LEVEL_NUMBER]>0
       AND [LEVEL_IS_VISIBLE]')
  WHERE CAST([CATALOG_NAME] AS VARCHAR(255))          = @Catalog
    AND CAST([CUBE_NAME] AS VARCHAR(255))             = @Cube
    AND CAST([DIMENSION_UNIQUE_NAME] AS VARCHAR(255)) = @Dimension
GO


The next proc returns measure groups with their associated dimensions. We have to join two DMVs together in order to get the description columns of both the dimension and measure group.

CREATE PROCEDURE [dbo].[upCubeDocMeasureGroupsForDimension]
    (@Catalog       VARCHAR(255)
    ,@Cube          VARCHAR(255)
    ,@Dimension     VARCHAR(255)
    )
AS
  SELECT
    mgd.*
    , m.[DESCRIPTION]
  FROM OPENQUERY(CubeLinkedServer,
    'SELECT
       [CATALOG_NAME]
       , [CUBE_NAME]
       , [MEASUREGROUP_NAME]
       , [MEASUREGROUP_CARDINALITY]
       , [DIMENSION_UNIQUE_NAME]
     FROM $SYSTEM.MDSCHEMA_MEASUREGROUP_DIMENSIONS
       WHERE [DIMENSION_IS_VISIBLE]') mgd
   INNER JOIN OPENQUERY(CubeLinkedServer,
     'SELECT
       [CATALOG_NAME]
       ,[CUBE_NAME]
       ,[MEASUREGROUP_NAME]
       ,[DESCRIPTION]
     FROM $SYSTEM.MDSCHEMA_MEASUREGROUPS') mg
        ON  CAST(mgd.[CATALOG_NAME] AS VARCHAR(255))
           = CAST(mg.[CATALOG_NAME] AS VARCHAR(255))
        AND CAST(mgd.[CUBE_NAME] AS VARCHAR(255))
           = CAST(mg.[CUBE_NAME] AS VARCHAR(255))
        AND CAST(mgd.[MEASUREGROUP_NAME] AS VARCHAR(255))
           = CAST(mg.[MEASUREGROUP_NAME] AS VARCHAR(255))
  WHERE CAST(mgd.[CATALOG_NAME] AS VARCHAR(255))            = @Catalog
    AND CAST(mgd.[CUBE_NAME] AS VARCHAR(255))               = @Cube
    AND CAST(mgd.[DIMENSION_UNIQUE_NAME] AS VARCHAR(255))   = @Dimension
GO


The next proc is similar to the above, but the opposite way around. It returns all dimensions that are related to a measure group.

CREATE PROCEDURE [dbo].[upCubeDocDimensionsForMeasureGroup]
  (@Catalog       VARCHAR(255)
  ,@Cube          VARCHAR(255)
  ,@MeasureGroup  VARCHAR(255)
  )
AS
  SELECT
    mgd.*
    , d.[DESCRIPTION]
  FROM OPENQUERY(CubeLinkedServer,
    'SELECT
        [CATALOG_NAME]
       ,[CUBE_NAME]
       ,[MEASUREGROUP_NAME]
       ,[MEASUREGROUP_CARDINALITY]
       ,[DIMENSION_UNIQUE_NAME]
       ,[DIMENSION_CARDINALITY]
       ,[DIMENSION_IS_VISIBLE]
       ,[DIMENSION_IS_FACT_DIMENSION]
       ,[DIMENSION_GRANULARITY]
     FROM $SYSTEM.MDSCHEMA_MEASUREGROUP_DIMENSIONS
       WHERE [DIMENSION_IS_VISIBLE]') mgd
  INNER JOIN OPENQUERY(CubeLinkedServer,
    'SELECT
       [CATALOG_NAME]
       ,[CUBE_NAME]
       ,[DIMENSION_UNIQUE_NAME]
       ,[DESCRIPTION]
     FROM $SYSTEM.MDSCHEMA_DIMENSIONS
       WHERE [DIMENSION_IS_VISIBLE]') d
   ON  CAST(mgd.[CATALOG_NAME] AS VARCHAR(255))
       = CAST(d.[CATALOG_NAME] AS VARCHAR(255))
   AND CAST(mgd.[CUBE_NAME] AS VARCHAR(255))
       = CAST(d.[CUBE_NAME] AS VARCHAR(255))
   AND CAST(mgd.[DIMENSION_UNIQUE_NAME] AS VARCHAR(255))
       = CAST(d.[DIMENSION_UNIQUE_NAME] AS VARCHAR(255))
  WHERE  CAST(mgd.[CATALOG_NAME] AS VARCHAR(255))        = @Catalog
     AND CAST(mgd.[CUBE_NAME] AS VARCHAR(255))           = @Cube
     AND CAST(mgd.[MEASUREGROUP_NAME] AS VARCHAR(255))   = @MeasureGroup
GO


The next proc builds a BUS matrix, joining every dimension to its related measure groups. Later we’ll use the SSRS tablix control to pivot this into matrix form.

CREATE PROCEDURE [dbo].[upCubeDocBUSMatrix]
    (@Catalog       VARCHAR(255),
     @Cube          VARCHAR(255)
    )
AS
  SELECT
     bus.[CATALOG_NAME]
    ,bus.[CUBE_NAME]
    ,bus.[MEASUREGROUP_NAME]
    ,bus.[MEASUREGROUP_CARDINALITY]
    ,bus.[DIMENSION_UNIQUE_NAME]
    ,bus.[DIMENSION_CARDINALITY]
    ,bus.[DIMENSION_IS_FACT_DIMENSION]
    ,bus.[DIMENSION_GRANULARITY]
    ,dim.[DIMENSION_MASTER_NAME]
    ,1 AS Relationship
  FROM
    OPENQUERY(CubeLinkedServer,
      'SELECT
        [CATALOG_NAME]
        ,[CUBE_NAME]
        ,[MEASUREGROUP_NAME]
        ,[MEASUREGROUP_CARDINALITY]
        ,[DIMENSION_UNIQUE_NAME]
        ,[DIMENSION_CARDINALITY]
        ,[DIMENSION_IS_FACT_DIMENSION]
        ,[DIMENSION_GRANULARITY]
       FROM $SYSTEM.MDSCHEMA_MEASUREGROUP_DIMENSIONS
        WHERE [DIMENSION_IS_VISIBLE]') bus
    INNER JOIN OPENQUERY(CubeLinkedServer,
      'SELECT
        [CATALOG_NAME]
        ,[CUBE_NAME]
        ,[DIMENSION_UNIQUE_NAME]
        ,[DIMENSION_MASTER_NAME]
       FROM $SYSTEM.MDSCHEMA_DIMENSIONS') dim
    ON CAST(bus.[CATALOG_NAME] AS VARCHAR(255))
     = CAST(dim.[CATALOG_NAME] AS VARCHAR(255))
    AND CAST(bus.[CUBE_NAME] AS VARCHAR(255))
     = CAST(dim.[CUBE_NAME] AS VARCHAR(255))
    AND CAST(bus.[DIMENSION_UNIQUE_NAME] AS VARCHAR(255))
     = CAST(dim.[DIMENSION_UNIQUE_NAME] AS VARCHAR(255))
  WHERE  CAST(bus.[CATALOG_NAME] AS VARCHAR(255)) = @Catalog
     AND CAST(bus.[CUBE_NAME] AS VARCHAR(255)) = @Cube
GO


Next, in order to make it easier for users to find items within the cube, I’ve created a searching proc which will scour a number of the DMVs for anything containing the search term.

CREATE PROCEDURE [dbo].[upCubeDocSearch]
    (@Search        VARCHAR(255)
    ,@Catalog       VARCHAR(255)=NULL
    ,@Cube          VARCHAR(255)=NULL
    )
AS
  WITH MetaData AS
  (
   --Cubes
    SELECT CAST('Cube' AS VARCHAR(20))            AS [Type]
      , CAST(CATALOG_NAME AS VARCHAR(255))     AS [Catalog]
      , CAST(CUBE_NAME AS VARCHAR(255))           AS [Cube]
      , CAST(CUBE_NAME AS VARCHAR(255))           AS [Name]
      , CAST(DESCRIPTION AS VARCHAR(4000)) AS [Description]
      , CAST(CUBE_NAME AS VARCHAR(255))           AS [Link]
    FROM OPENQUERY(CubeLinkedServer,
      'SELECT [CATALOG_NAME], [CUBE_NAME], [DESCRIPTION]
       FROM $SYSTEM.MDSCHEMA_CUBES
       WHERE CUBE_SOURCE = 1')
    WHERE  (CAST([CATALOG_NAME] AS VARCHAR(255))
       = @Catalog OR @Catalog IS NULL)

    UNION ALL

   --Dimensions
    SELECT CAST('Dimension' AS VARCHAR(20))         AS [Type]
      , CAST(CATALOG_NAME AS VARCHAR(255))       AS [Catalog]
      , CAST(CUBE_NAME AS VARCHAR(255))             AS [Cube]
      , CAST(DIMENSION_NAME AS VARCHAR(255))        AS [Name]
      , CAST(DESCRIPTION AS VARCHAR(4000))   AS [Description]
      , CAST(DIMENSION_UNIQUE_NAME AS VARCHAR(255)) AS [Link]
    FROM OPENQUERY(CubeLinkedServer,
      'SELECT [CATALOG_NAME], [CUBE_NAME]
          , [DIMENSION_NAME], [DESCRIPTION]
          , [DIMENSION_UNIQUE_NAME]
       FROM $SYSTEM.MDSCHEMA_DIMENSIONS
         WHERE [DIMENSION_IS_VISIBLE]')
    WHERE  (CAST([CATALOG_NAME] AS VARCHAR(255))
        = @Catalog OR @Catalog IS NULL)
      AND (CAST([CUBE_NAME] AS VARCHAR(255))
        = @Cube OR @Cube IS NULL)
      AND LEFT(CAST(CUBE_NAME AS VARCHAR(255)),1)
        <>'$' --Filter out dimensions not in a cube

    UNION ALL

   --Attributes
    SELECT CAST('Attribute' AS VARCHAR(20))         AS [Type]
      , CAST(CATALOG_NAME AS VARCHAR(255))       AS [Catalog]
      , CAST(CUBE_NAME AS VARCHAR(255))             AS [Cube]
      , CAST(LEVEL_CAPTION AS VARCHAR(255))         AS [Name]
      , CAST(DESCRIPTION AS VARCHAR(4000))   AS [Description]
      , CAST(DIMENSION_UNIQUE_NAME AS VARCHAR(255)) AS [Link]
    FROM OPENQUERY(CubeLinkedServer,
      'SELECT [CATALOG_NAME], [CUBE_NAME]
         , [LEVEL_CAPTION], [DESCRIPTION],
         , [DIMENSION_UNIQUE_NAME]
       FROM $SYSTEM.MDSCHEMA_LEVELS
       WHERE [LEVEL_NUMBER]>0
         AND [LEVEL_IS_VISIBLE]')
    WHERE  (CAST([CATALOG_NAME] AS VARCHAR(255))
         = @Catalog OR @Catalog IS NULL)
      AND (CAST([CUBE_NAME] AS VARCHAR(255))
         = @Cube OR @Cube IS NULL)
      AND LEFT(CAST(CUBE_NAME AS VARCHAR(255)),1)
         <>'$' --Filter out dimensions not in a cube

    UNION ALL

   --Measure Groups
    SELECT CAST('Measure Group' AS VARCHAR(20))   AS [Type]
      , CAST(CATALOG_NAME AS VARCHAR(255))     AS [Catalog]
      , CAST(CUBE_NAME AS VARCHAR(255))           AS [Cube]
      , CAST(MEASUREGROUP_NAME AS VARCHAR(255))   AS [Name]
      , CAST(DESCRIPTION AS VARCHAR(4000)) AS [Description]
      , CAST(MEASUREGROUP_NAME AS VARCHAR(255))   AS [Link]
    FROM OPENQUERY(CubeLinkedServer,
       'SELECT [CATALOG_NAME], [CUBE_NAME]
          , [MEASUREGROUP_NAME],
          , [DESCRIPTION]
        FROM $SYSTEM.MDSCHEMA_MEASUREGROUPS')
    WHERE  (CAST([CATALOG_NAME] AS VARCHAR(255))
       = @Catalog OR @Catalog IS NULL)
     AND (CAST([CUBE_NAME] AS VARCHAR(255))
       = @Cube OR @Cube IS NULL)
     AND LEFT(CAST(CUBE_NAME AS VARCHAR(255)),1)
       <>'$' --Filter out dimensions not in a cube

    UNION ALL

   --Measures
    SELECT CAST('Measure' AS VARCHAR(20))         AS [Type]
      , CAST(CATALOG_NAME AS VARCHAR(255))     AS [Catalog]
      , CAST(CUBE_NAME AS VARCHAR(255))           AS [Cube]
      , CAST(MEASURE_NAME AS VARCHAR(255))        AS [Name]
      , CAST(DESCRIPTION AS VARCHAR(4000)) AS [Description]
      , CAST(MEASUREGROUP_NAME AS VARCHAR(255))   AS [Link]
    FROM OPENQUERY(CubeLinkedServer,
      'SELECT [CATALOG_NAME], [CUBE_NAME]
         , [MEASURE_NAME], [DESCRIPTION],
         , [MEASUREGROUP_NAME]
       FROM $SYSTEM.MDSCHEMA_MEASURES
          WHERE [MEASURE_IS_VISIBLE]')
    WHERE  (CAST([CATALOG_NAME] AS VARCHAR(255))
          = @Catalog OR @Catalog IS NULL)
      AND (CAST([CUBE_NAME] AS VARCHAR(255))
          = @Cube OR @Cube IS NULL)
      AND LEFT(CAST(CUBE_NAME AS VARCHAR(255)),1)
          <>'$' --Filter out dimensions not in a cube

    )
    SELECT *
    FROM MetaData
    WHERE @Search<>''
        AND ([Name] LIKE '%' + @Search + '%'
          OR [Description] LIKE '%' + @Search + '%'
        )
GO


We can now use these procs to form the basis of a number of SSRS reports which will dynamically query the DMVs to generate the SSAS cube documentation. I’ll be covering this stage in my next post.

 

Will DAX replace MDX

For those that haven’t yet heard of DAX, it’s an expression language developed by Microsoft to perform calculations against PowerPivot. Stepping back one step further, PowerPivot is essentially a local Analysis Services cube that runs within Excel 2010.

I’ve heard plenty of comments from various sources about how DAX is the [multi-dimensional] query language of the future and how it’s going to kill off MDX. Ok…. well no, it’s not ok.

For starters they both exist for two different purposes. DAX is not a query language but an expression language. You can use MDX to query information from a cube and generate a pivot, you can’t with DAX as it is not a query language.

The best way to think of DAX is as an extension to Excel formulas, you can use it to perform a calculation against an existing set of cells, in this case, a PowerPivot dataset.

It is also similar to an MDX calculated member, and in fact supports a number of MDX functions (TotalYTD, ParallelPeriod etc.).

If your data is in a database: You would use SQL to query data from a database and import the results into Excel. You would then use Excel expressions/calculations to enhance the data.

If your data is in a cube: You would use an Excel pivot (or MDX query) to query the data and import the results into Excel. You then have to use a third party tool such as OLAP PivotTable Extensions to add expressions/calculations to enhance the data.

If your data is in PowerPivot: You would use PowerPivot to query the data and import the results into Excel. You would then use DAX to add calculations to enhance the data.

DAX is a fantastic expression tool, and one that provides significant power to PowerPivot, but no, it won’t replace MDX. My hope is that Microsoft will provide DAX capability for MDX queries as well, and not restrict it to PowerPivot queries. As I’ve shown in my previous blog post it’s a great expression language that would provide significant benefit to cube users.

Calculate Run Rate (Full Year Projection) in MDX

This post explains how to create an MDX calculated member that will take a value from the cube and project it forward to the end of the year. This provides a simple mechanism for calculating what your expected total will be at year end, based upon current performance.

To do this more accurately you should use time series data mining models in SSAS and use DMX expressions to query the results, but this method is very simple and requires little effort, and will be pretty accurate so long as the data you’re modelling is fairly linear. Please note though that the more cyclical and seasonal your data is the less effective this will be.

The basic idea is that we take what we have done so far (i.e. year to date sales), look at how far through the year we are, and extrapolate the value of future months (or days/weeks/etc.) based upon values so far.

i.e. If we’re at March month end and we’ve sold 100 widgets so far this year, we’re 1/4 of the way through the year so we multiply 100 by 4 and get a prejected yearly total of 400.


This chart shows the concept of what we’re doing, and shows the full year prejections calculated in March (with 3 months of available data) and June (6 months of data). The projections obviously get more accurate the further you are through the year.

One of the points to note is that when creating a calculation like this, based upon a time dimension, the calculation should always work with any level of the dimension hierarchy selected. i.e. The user shouldn’t care whether they’re looking at a month, week, quarter or a day, the calculation should always work the same. To achieve this we simply use the .currentmember of the time hierarchy.

The following examples are based upon projecting the Internet Sales Amount measure found within the SQL Server 2008 Adventure Works DW sample cube.

Step 1 – What are our total sales so far this year?

MDX helpfully provides us with the YTD function which takes care of this for us.


  MEMBER [Measures].[YTD Sales] AS
    AGGREGATE(
      YTD([Date].[Calendar].CurrentMember)
      ,[Measures].[Internet Sales Amount])

This takes the current member of the Calendar hierarchy, and creates a set of all dates before it (this year) using YTD. It then aggregates (in this case sums) the Internet Sales Amount for all of these dates to calculate YTD Sales.

Step 2 – Which period are we in?

Here we’ll use the same YTD function to create a set of all dates so far this year, but in this case we’ll count the number of resulting members. Note that because we’re using the .CurrentMember of the hierarchy, it doesn’t matter if we’re looking at a date, week or month, the MDX will work. i.e. If we’re looking at 21 Jan it will return 21. If we’re looking at Q3 it will return 3, August will return 8 etc.


  MEMBER [Measures].[CurPeriod] AS
    COUNT(
      YTD([Date].[Calendar].CurrentMember)
      ,INCLUDEEMPTY)

Step 3 – How many periods are in the year?

If we coded this to only work with months then we could hard code this to 12 however we need to keep it generic to all levels of the hierarchy. So, we have to count all the cousins of the current time member [within this year]. Unfortunately there isn’t a Cousins function in MDX, and Siblings will only return other members within the same parent. i.e. siblings of May 4th would include May 1 through to May 31. To get around this we find the year of the current member by using the Ancestor function.


  ANCESTOR([Date].[Calendar].CurrentMember
  , [Date].[Calendar].[Calendar Year])

Then we find all of the descendants of the year, at the same level of the hierarchy (week/day/etc.) as the current member. We can then take a count as before.


  MEMBER [Measures].[TotalPeriods] AS
    COUNT(
      DESCENDANTS(
        ANCESTOR([Date].[Calendar].CurrentMember
          ,[Date].[Calendar].[Calendar Year])
        ,[Date].[Calendar].CurrentMember.level)
      ,INCLUDEEMPTY)

Step 4 – Calculate the Run Rate

Calculating the prejected yearly total (run rate) is then a simple calculation


  MEMBER [Measures].[Full Year Run Rate] AS
    [Measures].[YTD Sales]
    * ([Measures].[TotalPeriods]
       /[Measures].[CurPeriod])

You can then put the whole lot together and see the results…


WITH

  MEMBER [Measures].[YTD Sales] AS
    AGGREGATE(
      YTD([Date].[Calendar].CurrentMember)
      ,[Measures].[Internet Sales Amount])

  MEMBER [Measures].[CurPeriod] AS
    COUNT(
      YTD([Date].[Calendar].CurrentMember)
      ,INCLUDEEMPTY)

  MEMBER [Measures].[TotalPeriods] AS
    COUNT(
      DESCENDANTS(
        ANCESTOR([Date].[Calendar].CurrentMember
          ,[Date].[Calendar].[Calendar Year])
        ,[Date].[Calendar].CurrentMember.level)
      ,INCLUDEEMPTY)

  MEMBER [Measures].[Full Year Run Rate] AS
    [Measures].[YTD Sales]
    * ([Measures].[TotalPeriods]
       /[Measures].[CurPeriod])

SELECT
{
     [Measures].[Internet Sales Amount]
    ,[Measures].[YTD Sales]
    ,[Measures].[Full Year Run Rate]
    ,[Measures].[CurPeriod]
    ,[Measures].[TotalPeriods]
} ON 0,
{
    DESCENDANTS([Date].[Calendar].[CY 2003])
} ON 1
FROM [Direct Sales]

In my next blog I’ll be diong the same calculation in DAX for use with PowerPivot, stay tuned…

Frog-Blog Out

MDX Calculated Member Spanning Multiple Date Dimensions

It’s common in most cubes to have a number of different date dimensions, whether role playing, distinct, or a combination of both. Say for example, Entry Date, Posting Date and Accounting Period. There may also be numerous hierarchies in each date dimension, such as calendar and fiscal calendar, leading to a relatively complicated array of dates to worry about when calculating semi-additive measures.

If we create a date related calculation (i.e. total to date) how do we ensure that this calculation works across all date dimensions?

Lets assume we have a stock movement measure, where each record in the fact table is the change in stock (plus or minus). The current stock level is found by using a calculation totaling every record to date.

CREATE MEMBER CURRENTCUBE.[Measures].[Stock Level]
AS
  SUM({NULL:[Date].[Calendar].CurrentMember}
     , [Measures].[Stock Movement]
  );

[Note that {NULL:xxx} just creates a set of everything before the xxx member, i.e. everything to date]

This works just fine, if the user selects the [Date].[Calendar] hierarchy. What if the user selects the [Date].[Fiscal] hierarchy, or the [Period] dimension? Basically the calculation wont work, as the MDX expression is only aware of the [Date].[Calendar] hierarchy.

The simple solution is to use the Aggregate function over all of the dimensions that the calculation needs to be aware of:

CREATE MEMBER CURRENTCUBE.[Measures].[Stock Level]
AS
  AGGREGATE(
      {NULL:[Date].[Fiscal].CurrentMember}
       * {NULL:[Date].[Calendar].CurrentMember}
       * {NULL:[Period].[Period].CurrentMember}
    , [Measures].[Stock Movement]
  );

The calculation will then use whichever date or time hierarchy is selected. It will even cope if multiple dimensions are selected, say the calendar on 0 and the periods on 1, both axis will honor the aggregation as expected.

Frog-Blog out.

SQL User Group Session 24 June 2010

I’m excited to be presenting another session to the South Wales SQL Server User Group.

On Thursday 24th June 2010, Eversheds in Cardiff are kindly hosting the event, to run from 18:45 to 21:00.
The event is free, and you’ll even get pizza thrown in – what more can you ask for? Oh yes, some BI content…

My session will cover data warehouse modelling, including a number of hands on business case studies including transactional data, account balances and duration based data.

Please feel free to bring your own data modelling problems along and I’ll try and cover as many as I can.

Register for free here: http://www.sqlserverfaq.com/events/235/Data-warehouse-design-case-studies-Other-BI-related-session-TBC.aspx

Hope to see you there!

SQL Server 2008 R2 – PowerPivot and Master Data Services

Purple Frog spent a very interesting day at Microsoft last week, at one of their many events promoting the launch of SQL Server 2008 R2. Rafal Lukewiecki presented an entertaining (as always!) and informative series of talks covering the release, focusing on the enhanced Business Intelligence tools available.

The primary changes to note are

  • Power Pivot – An in memory, client side add-in to Excel, that allows users to create virtual cubes on their desktop and analyse over 100m records of data virtually instantly
  • DAX – A new expression language, designed for non-technical (probably more realistically, semi-technical) users to extend pivot tables and power pivot tables without having to learn MDX
  • Report Components – In a report consisting of a couple of tables, a chart and a few gauges (gauges, sparklines & maps are all new features of SSRS), you can save each element as a component and re-use it in different reports. This should result in much less duplication of work.
  • Report Builder 3 – A thin-client tool allowing end users to create Reporting Services reports. This is a big enhancement over its predecessor s it is finally fully compatible with reports created in the Business Intelligence Development Studio (BIDS), including report components.
  • Master Data Services – A centralised tool and database intended to provide governance of your organisation’s master data (centralised list of products, fiscal calendar, regions etc.).

The enhancements to Reporting Services (SSRS) are very welcome, and should be of huge benefit to anyone either currently using SSRS or considering using it. I firmly believe that there are no comparable web based reporting engines that even come close for SME organisations when looking at the whole picture including cost of implementation, ease of use, flexibility and capability.

Master Data Services as a concept has been around for a long time, but there has never been a tool available to organisations to effectively implement it. This is Microsoft’s first proper stab at delivering a workable solution, and although I’m a big fan of the concept, and have no doubt of its benefit to a SME, I’m yet to be convinced that the tool is ready for a large scale corporate environment. Time will tell how scalable and manageable the system is, and credit has to go to Microsoft for starting the ball rolling.

The most impressive addition is without a doubt PowerPivot. In a nutshell, it’s a user defined OLAP cube wrapped up within Excel 2010, running entirely in memory on a user’s workstation. If you’ve not yet played with it or seen a demo, I’ll try and elaborate for you… Think about loading Excel with 1 million rows, and then imagine sorting and filtering a number of those columns [cue going out to lunch whilst waiting for Excel to catch up]. With PowerPivot, you can sort and filter over 100 million rows of data almost in an instant – it’s very impressive indeed!

That’s the snazzy demo bit, but to reduce it to a glorified spreadsheet is very harsh indeed. It allows a user to import multiple data sources and combine them together into a single dimensional data model, PowerPivot will create your own personal cube, without you having to build a warehouse, without knowing anything about MDX, dimension hierarchies, attribute relationships, granularity etc. etc.

Microsoft’s vision and reason for creating this tool is self-service BI, allowing users to create their own cubes, data analysis environments and reporting systems. And this is where I start to have a problem…

I can’t remember the last time I designed a data warehouse, where I did not find significant data quality problems, conflicting data, missing data, duplicated data etc.. I also find it hard to think of a situation where an end user (even a power user) is sufficiently clued up about the intricacies of a source OLTP database to be able to extract the right data and know what to do with it. Or if they are, a dozen other people in different departments have a different idea about how things work, resulting in many different versions of the truth.

I’m therefore (for now!) sticking with the opinion that it is still absolutely vital for an organisation to provide a clean, consistent, dimensionally modelled data warehouse as the basis for their BI/MI infrastructure. Tools like PowerPivot then sit very nicely on top to provide an incredibly powerful and beneficial user experience, but to try and use the emergence of self-service BI tools to usher in a new ‘non-data warehouse’ era is a very dangerous route which I hope people will avoid.

In summary – this release brings with it a fantastic host of new tools, but with great power comes great responsibility…

Excel Cube Pivot drillthrough limited to 1000 rows

When browsing a cube using Excel 2007, you can drillthrough the measures to display up to 1000 rows of the transaction level source data.

I often get asked whether this limit of 1000 rows is configurable – well the good news is yes it is.

There is an option in the actions tab of the BIDS cube designer which allows you to specify the maximum rows, but helpfully this is ignored by Excel. Instead, you have to set it in Excel when you create a pivot.

Just click “Options” on the “PivotTable Tools” ribon, then in the “Change Data Source” dropdown click on “Connection Properties“. In this screen, just change the “Maximum number of records to retrieve” property.

Excel 2007 Pivot Options

MDX Sub select Vs WHERE clause

I’ve just read an interesting thread on the SQL Server Developer Center forum, regarding how to filter results. Specifically the difference in MDX between using a subselect

SELECT x on COLUMNS, y on ROWS FROM ( SELECT z on COLUMNS FROM cube))

or using a where clause

SELECT x on COLUMNS, y on ROWS FROM cube WHERE z

In a simple query they produce the same results, but what is the actual difference? You can read the full thread here, but to summarise Darren Gosbell’s response…

Using the WHERE clause sets the query context and consequently the CurrentMember. This then enables functions such as YTD and PerdiodsToDate to work.

Using a subselect can provide improved performance, but does not set the context.

Simples..!

Scope Problems with MDX Calculated Members

We were recently investigating a problem for a client regarding the use of Scope within MDX calculated members. The code in question was similar to this:

CREATE MEMBER
   CURRENTCUBE.[Measures].[Test Measure To Date]
   AS "NA", VISIBLE = 1;
Scope([Date].[Calendar].MEMBERS);
    [Measures].[Test Measure To Date] =
      SUM(NULL:[Date].[Calendar].CurrentMember,
        [Measures].[Test Measure]);
End Scope;
Scope([Date].[Fiscal].MEMBERS);
    [Measures].[Test Measure To Date] =
      SUM(NULL:[Date].[Fiscal].CurrentMember,
        [Measures].[Test Measure]);
End Scope;

Essentially the warehouse was providing a transaction table with credits and debits, this calculated measure was supposed to provide the current balance, summing all transactions to date (not just the current year/period etc, but the entire history). Scope is used to enable the calculation to work across two different date hierarchies, calendar and fiscal.

The problem was that even when the [Date].[Calendar] hierarchy was selected, the code still used the fiscal hierarchy to calculate the value.

This is caused by the fact that [Date].[Fiscal].MEMBERS includes the member [Date].[Fiscal].[All]. Consequently, even when the Fiscal hierarchy was not included in the query, its [All] member was effectively still within the scope. Thus the fiscal calculation was overriding the calendar calculation no matter what was selected.

The solution to this is to exclude [All] from the scope, which can be done by changing the code to the following:

CREATE MEMBER
   CURRENTCUBE.[Measures].[Test Measure To Date]
   AS "NA", VISIBLE = 1;
Scope(DESCENDANTS([Date].[Calendar],,AFTER));
    [Measures].[Test Measure To Date] =
      SUM(NULL:[Date].[Calendar].CurrentMember,
        [Measures].[Test Measure]);
End Scope;
Scope(DESCENDANTS([Date].[Fiscal],,AFTER));
    [Measures].[Test Measure To Date] =
      SUM(NULL:[Date].[Fiscal].CurrentMember,
        [Measures].[Test Measure]);
End Scope;

DESCENDANTS(xxx,,AFTER) is a simple way of identifying every descendent of the hierarchy AFTER the current member, which is [All] when not specified.

Problem solved, Frog-blog out.

Excel 2007 and SSAS 2008 Error

I was working on a new SSAS 2008 cube today, and came across an error which Google was unable to help with. I thought I’d post the solution here to help anyone else who may encounter it.

The cube in question will be primarily be accessed using Excel 2007, so I’d been dutifully testing it along the way to ensure all was well. And then, after a number of changes the following error appeared when connecting to the cube from Excel to create a pivot table.

Excel was unable to get necessary information about this cube. The cube might have been reorganized or changed on the server.

Contact the OLAP cube administrator and, if necessary, set up a new data source to connect to the cube

Connecting and querying the cube via SSMS or BIDS worked without error (hense I didn’t spot the error sooner!).

A quick Google revealed a number of posts regarding this error, but they all related to attributes containing invalid characters when accessed from Excel 2000 Or problems with translations and locale settings in the .oqy file. Neither of these was the cause here, so I had to go back and recreate every change I had made step by step to track the problem.

Well, I’m please to report that in the end it was nothing more that a simple spelling mistake in a named set. One of the dynamic named sets in the cube calculations referred to a specific member of a dimension, which was spelled slightly incorrectly. (Simplified example..)

CREATE DYNAMIC SET CURRENTCUBE.[Set1]
 AS {[Dimension].[Attribute].[Value1],
     [Dimension].[Attribute].[Value2WithTypo]
    };

When querying calculated measures through MDX in SSMS, the MDX parser just ignored the problem and only uses the valid members, however it appears as though Excel 2007 is slightly more picky with its cubes.

Useful to know, and even more useful when used as a tool to double check for any errors in the MDX calculations.

Power BI Sentinel
The Frog Blog

Team Purple Frog specialise in designing and implementing Microsoft Data Analytics solutions, including Data Warehouses, Cubes, SQL Server, SSIS, ADF, SSAS, Power BI, MDX, DAX, Machine Learning and more.

This is a collection of thoughts, ramblings and ideas that we think would be useful to share.

Authors:

Alex Whittles
(MVP)
Jeet Kainth
Jon Fletcher
Nick Edwards
Joe Billingham
Lewis Prince
Reiss McSporran
Microsoft Gold Partner

Data Platform MVP

Power BI Sentinel
Frog Blog Out
twitter
rssicon