Posts Tagged ‘Analysis Services’

Today we’re back looking at MDX with a quick how-to. How do you extract a list of dimension members using MDX?

This could be used for example to populate a user interface/parameter of a report/dashboard.

Let’s look at two scenarios; normal dimensions, and then hierarchies. All the queries here use the “Adventure Works DW 2008 R2″ “Adventure Works” cube.

Standard dimension attributes

So how do we get a list of dimension members? Well lets divide the query up into rows and columns. We want the members listed on the rows.

1
2
3
4
SELECT
  xxx ON COLUMNS,
  [Geography].[Country].ALLMEMBERS ON ROWS
FROM [Adventure Works]

Why use .ALLMEMBERS instead of .MEMBERS? .ALLMEMBERS will include calculated members, whereas .MEMBERS won’t.

So what do we put on the rows? We want the display name for showing to the user, and the unique name so we can refer back to each exact member. We can get these by using properties of the current member, namely MEMBER_CAPTION and UNIQUENAME. Get a full list of dimension member properties here. To make this work, we just define a number of query-scoped calculated members using WITH, and then include them in the columns set…

1
2
3
4
5
6
7
8
WITH 
   MEMBER [Measures].[Label] AS [Geography].[Country].CURRENTMEMBER.MEMBER_CAPTION 
   MEMBER [Measures].[UniqueName] AS [Geography].[Country].CURRENTMEMBER.UNIQUENAME 
SELECT {[Measures].[Label], 
        [Measures].[UniqueName]
        } ON COLUMNS , 
      [Geography].[Country].ALLMEMBERS ON ROWS 
FROM [Adventure Works]

If you run this, you get the following output:

This may be what you want, but note that the first row of the result set contains the ‘ALL’ member, which you probably don’t want. You can remove this by altering the ROWS clause to exclude them. This is simply a case of repeating the hierarchy name, e.g. [Geography].[Country].[Country].ALLMEMBERS

1
2
3
4
5
6
7
8
WITH 
   MEMBER [Measures].[Label] AS [Geography].[Country].CURRENTMEMBER.MEMBER_CAPTION 
   MEMBER [Measures].[UniqueName] AS [Geography].[Country].CURRENTMEMBER.UNIQUENAME 
SELECT {[Measures].[Label], 
        [Measures].[UniqueName]
        } ON COLUMNS , 
      [Geography].[Country].[Country].ALLMEMBERS ON ROWS 
FROM [Adventure Works]

Hierarchies

With simple dimension attributes, you get a flat list of values. With hierarchies, whether standard user hierarchies, ragged or parent-child, you need to return a multi-level list. To do this we need to know what level each member is at. We can query this using the LEVEL.ORDINAL property. Adding this into the query, and replacing Geography for Employee, we get the following query:

1
2
3
4
5
6
7
8
9
10
WITH 
   MEMBER [Measures].[Label] AS [Employee].[Employees].CURRENTMEMBER.MEMBER_CAPTION 
   MEMBER [Measures].[UniqueName] AS [Employee].[Employees].CURRENTMEMBER.UNIQUENAME 
   MEMBER [Measures].[Ordinal] AS [Employee].[Employees].CURRENTMEMBER.LEVEL.ORDINAL 
SELECT {[Measures].[Label], 
        [Measures].[UniqueName], 
        [Measures].[Ordinal]
        } ON COLUMNS , 
      [Employee].[Employees].ALLMEMBERS ON ROWS 
FROM [Adventure Works]

With the following results:

You can then do all sorts of funky things using basic MDX navigation functions. e.g. returning the parent of each member, by simply adding …PARENT.UNIQUENAME as another column

1
2
3
4
5
6
7
8
9
10
11
12
WITH 
   MEMBER [Measures].[Label] AS [Employee].[Employees].CURRENTMEMBER.MEMBER_CAPTION 
   MEMBER [Measures].[UniqueName] AS [Employee].[Employees].CURRENTMEMBER.UNIQUENAME 
   MEMBER [Measures].[Ordinal] AS [Employee].[Employees].CURRENTMEMBER.LEVEL.ORDINAL 
   MEMBER [Measures].[Parent] AS [Employee].[Employees].PARENT.UNIQUENAME
SELECT {[Measures].[Label], 
        [Measures].[UniqueName], 
        [Measures].[Ordinal],
        [Measures].[Parent]
        } ON COLUMNS , 
      [Employee].[Employees].ALLMEMBERS ON ROWS 
FROM [Adventure Works]

Now go forth and query those dimension members…
FrogBlog-Out

© Alex Whittles, Purple Frog Systems Ltd

I had an interesting question the other day; “how do I find the date of the first order/activity for a given customer/employee/product etc in MDX”?

Well you’ll be pleased to know that it’s pretty simple. The sample code bellow will work with the AdventureWorks DW 2008R2 cube.

First of all we need to filter the date dimension to contain only those with orders. Then take the first item of the resulting set, then find its name. Simples.

Filter the date hierarchy to only those dates with a Reseller Order Count:

   FILTER([Date].[Date].[Date].MEMBERS, [Measures].[Reseller Order count])

Note that you can specify a more advanced filter, such as [Measures].[Reseller Order Count]>10, which would find the first date which had more than 10 orders.

Then find the first item:

   .ITEM(0)

Then find the its name:

   .NAME

Put this together in a query and you get:

   WITH MEMBER [Measures].[First Activity] AS
       FILTER([Date].[Date].[Date].MEMBERS
            , [Measures].[Reseller Order count]).ITEM(0).NAME
   SELECT {[Measures].[Reseller Order count]
         , [Measures].[First Activity]
       } ON 0,
      [Employee].[Employees].MEMBERS ON 1
   FROM [Adventure Works]

This returns the total reseller order count for each employee, along with the date of their first reseller order. Note that the Employee hierarchy here is a parent child hierarchy. The calculated member [First Activity] will aggregate and calculate correctly with any dimension or member specified on the 1 (ROWS) axis, be it a parent-child, single attribute, normal hierarchy, etc. and will always find the first order date for the current member.
You should get the following results:

Frog-Blog Out.

© Alex Whittles, Purple Frog Systems Ltd

Automating OLAP cube documentation – SQLBits presentation

For anyone that missed my presentation at SQLBits 8 in April, the video is now available here.


In this 1 hour session I present a method of automating the creation of documentation for SSAS OLAP cubes by using DMVs (dynamic management views) and spatial data, querying the metadata of the cube in realtime.

The results include the BUS matrix, star schemas, attribute lists, hierarchies etc. and are all presented in SSRS.


The blog posts to go with this are here:


You can view the slide deck here

© Alex Whittles, Purple Frog Systems Ltd

I’m a happy chap. Why? Because I read a blog post yesterday by T.K. Anand (SSAS Principal Group Program Manager) about the vision and roadmap of Analysis Services.

There were slightly concerning questions last November (following the PASS conference) surounding the future of Analysis Services, or more specifically the UDM, the dimensional model that we all know and love in SSAS 2005 & 2008. The arrival of PowerPivot into the Microsoft BI arsenal has without a doubt moved the goalposts and added significant power, flexibility and usability to the BI stack. My concern, along with others (most notibly Chris Webb, who sparked somewhat of a stampede on the issue), was for the future of the UDM and the multitude of existing dimensional systems out in the field. Is the dimensional approach being phased out? Will it be supported in future editions? Will it be enhanced? Will the future BISM support the complexity and power we currently have with the UDM?

There’s no doubt that the overall approach to business intelligence is evolving. And this isn’t just in the cube space, it obviously has a direct effect on all other aspects of the BI strategy; the data warehouse, reporting layer, ETL etc.


From a BI consultant’s point of view, I don’t want to be recommending tools to my clients which have a restricted life span and don’t provide them a future proof upgrade path.

From a technology perspective, I’m a hardened supporter of the dimensional model. I recently designed a complex cube system for a banking client which had over 150 dimensions and facts, with thousands of lines of MDX to create a very sophisticated calculation framework for their liquidity modelling and loan profiling. I wouldn’t dream of doing that in a tabular approach like PowerPivot (in their current form).

From a personal point of view, where do I focus my attention in terms of training, research, blogging, user groups, conference sessions etc. etc.


I should point out that I’m very excited by, and fully committed to the tabular/PowerPivot route (along with VertiPaq, Crescent, DAX, etc.) for systems that it is suited to. In fact I’m using it right now to prototype a global BI solution for a very large client. There are however some solutions that do not fit well with the tabular approach and are best suited to a dimensional approach. I’m in favour of a hybrid framework which allows the right tool to be used for the right system. And it looks like that’s what we’re going to get.

The guys at Microsoft have now evolved and clarified the roadmap, and have confirmed that the BISM (business intelligence semantic model, i.e. The core of Analysis Services in SQL Server Denali) will contain two parallel approaches that can both be used for whichever situations they are best suited to. More importantly, they are both here to stay, will both be developed further, and there will be a cross-availability of functionality and tools between them.

Multi Dimensional Model

Essentially the same as the existing UDM, the multi-dimensional data model will support MDX and ROLAP/MOLAP data access. Existing OLAP cubes in SQL 2008 will easily upgrade to this.

Tabular Model

Think of this as hosted PowerPivot. A tabular approach with a column based data store, DAX as the expression language and either VertiPaq or Direct Query for data access.


The two will co-exist side by side within a singluar BISM, albeit initially with a degree of seperation. In the upcoming CTP2 release (July 2011?) there will not be any cross-availability of functionality, i.e. VertiPaq, Crescent and DAX will not be available to the dimensional model. However TK makes it clear that this is a short term restriction in the CTP, and that Microsoft are commited to getting this cross availability in place in the finished product.

If you’re involed in BI in any way, I really do encourage you to go and read TK’s post in detail. The Business Intelligence world is changing. I now have total confidence that it’s for the better.

Frog-blog-out

© Alex Whittles, Purple Frog Systems Ltd

One of the great features of using Excel to browse an SSAS OLAP cube is the drillthrough ability. If you double click on any cell of an OLAP pivot table, Excel will create a new worksheet containing the top 1000 fact records that went to make up the figure in the selected cell.

N.B. The limit of 1000 rows can be altered, as per one of my previous blog posts here.


This feature is pretty well known, but not many folk realise how easy it is to reproduce this in SQL Server Management Studio (SSMS). All you need to do is prefix your query with DRILLTHROUGH.

i.e. Assuming an MDX query of

SELECT [Measures].[Internet Sales Amount] ON 0
FROM [Adventure Works]
WHERE [Date].[January 1, 2004]

Which returns the following results…

A query of

DRILLTHROUGH
SELECT [Measures].[Internet Sales Amount] ON 0
FROM [Adventure Works]
WHERE [Date].[January 1, 2004]

Returns the records contributing to the total figure. Great for diagnosing problems with an MDX query.

By default, only the first 10,000 rows are returned, but you can override this using MAXROWS

DRILLTHROUGH MAXROWS 500
SELECT [Measures].[Internet Sales Amount] ON 0
FROM [Adventure Works]
WHERE [Date].[January 1, 2004]

The columns that are returned are those defined in the Actions tab of the Cube Designer in BIDS (The Business Intelligence Development Studio).

If no action is defined, then the fact measures will be returned along with the keys that link to each relevant dimension, which tend not to be that helpful.


You can override the returned columns by using the RETURN clause

DRILLTHROUGH SELECT [Measures].[Internet Sales Amount] ON 0 FROM [Adventure Works] WHERE [Date].[January 1, 2004] RETURN [$Internet Sales Order Details].[Internet Sales Order] ,[$Sales Territory].[Sales Territory Region] ,NAME([$Product].[Product]) ,KEY([$Product].[Product]) ,UNIQUENAME([$Product].[Product]) ,[Internet Sales].[Internet Sales Amount] ,[Internet Sales].[Internet Order Quantity]



Note that there are some restrictions on what you can drill through

  • You can’t drill through an expression/calculation, only a raw measure
  • The MDX query needs to return a single cell (otherwise the cube does not know which one to drill through)
  • The data returned will be at the lowest granularity of the cube’s fact table

To explain the last point further, the cube does not return the raw data from the underlying data warehouse, but a summary of the facts grouped by unique combination of the relevant dimensions. i.e. if a warehouse table containing individual sales (by date, product, customer & store) is brought into a cube as a fact table that only has relationships with the date and product dimensions, then the cube drill through will return unique combinations of date and product, summarising sales for each combination. Extra granularity which the warehouse may contain (customer and store) will not be available.

Note that if you specify the RETURN columns, the rows are still returned at the lowest level of the fact table granularity, even if not all of the dimensions are brought out as columns. This may result in returning multiple identical records. Don’t worry, these will be distinct facts, just differentiated by a dimension/attribute that isn’t being returned.

You can find out more on TechNet here


Frog-Blog Out

© Alex Whittles, Purple Frog Systems Ltd

One of the most useful aspects of a Business Intelligence system is the ability to add calculations to create new measures. This centralises the logic of the calculation into a single place, ensuring consistency and standardisation across the user base.

By way of example, a simple calculation for profit (Income – Expenditure) wouldn’t be provided by the source database and historically would be implemented in each and every report. In a data warehouse and/or cube we can create the calculation in a single place for everyone to use.

This post highlights some of methods of doing this, each with their respective pros and cons.

Calculated Members in SSAS Cube


SSAS provides a ‘Calculations’ tab in the cube designer which allows you to create new measures using MDX. You can use any combination of existing measures and dimension attributes, along with the plethora of MDX functions available to create highly complex calculations.
Pros:

  • Very complex calculations can be created using all available MDX functions
  • No changes are required to the structure of the data warehouse
  • Changes to the calculation will apply to every record, historic and new
  • The results are not stored in the warehouse or cube, so no extra space is required
  • New calculations can be added without having to deploy or reprocess the cube
  • Calculations can be scoped to any level of aggregation and granularity. Different calculations can even be used for different scopes
  • Calculations can easily combine measures from different measure groups

Cons:

  • The calculation will not make use of SSAS cube aggregations, reducing performance
  • SSAS drill through actions will not work
  • The calculation results are not available in the data warehouse, only the cube

SQL Calculations in the Data Source View


There’s a layer in-between the data warehouse and the cube called the data source view (DSV). This presents the relevant tables in the warehouse to the cube, and can be used to enhance the underlying data with calculations. This can either be the dsv layer within the cube project, or I prefer to create SQL Server views to encapsulate the logic.
Pros:

  • No changes are required to the table structure of the data warehouse
  • Calculations use SQL not MDX, reducing the complexity
  • Changes to the calculation will apply to every record, historic and new
  • The calculation will make full use of SSAS cube aggregations
  • SSAS drill through actions will work
  • The results are not stored in the warehouse, so the size of the database does not increase

Cons:

  • The cube must be redeployed and reprocessed before the new measure is available
  • The results of the calculation must be valid at the granularity of the fact table
  • The calculation results are not available in the data warehouse, only the cube

Calculate in the ETL process


Whilst bringing in data from the source data systems, it sometimes makes sense to perform calculations on the data at that point, and store the results in the warehouse.
Pros:

  • The results of the calculation will be available when querying the warehouse as well as the cube
  • In the ETL pipeline you can import other data sources (using lookups etc.) to utilise other data in the calculation
  • If the calculation uses time based data, or data valid at a specific time (i.e. share price) then by performing the calculation in the ETL, the correct time based data is used, without having to store the full history of the underlying source data
  • The calculation will make full use of SSAS cube aggregations
  • SSAS drill through actions will work

Cons:

  • You have to be able to alter the structure of the data warehouse, which isn’t always an option.
  • The results are stored in the warehouse, increasing the size of the database
  • The results of the calculation must be valid at the granularity of the fact table
  • If the calculation logic changes, all existing records must be updated


In Conclusion

If the calculation is valid for a single record, and it would be of benefit to have access to the results in the warehouse, then perform the calculation in the ETL pipeline and store teh results in the warehouse.

If the calculation is valid for a single record, and it would not be of benefit to have the results in the warehouse, then calculate it in the data source view.

If the calculation is too complex for SQL, requiring MDX functions, then create an MDX calculated measure in the cube.

© Alex Whittles, Purple Frog Systems Ltd

It’s common in most cubes to have a number of different date dimensions, whether role playing, distinct, or a combination of both. Say for example, Entry Date, Posting Date and Accounting Period. There may also be numerous hierarchies in each date dimension, such as calendar and fiscal calendar, leading to a relatively complicated array of dates to worry about when calculating semi-additive measures.

If we create a date related calculation (i.e. total to date) how do we ensure that this calculation works across all date dimensions?

Lets assume we have a stock movement measure, where each record in the fact table is the change in stock (plus or minus). The current stock level is found by using a calculation totaling every record to date.

CREATE MEMBER CURRENTCUBE.[Measures].[Stock Level]
AS
  SUM({NULL:[Date].[Calendar].CurrentMember}
     , [Measures].[Stock Movement]
  );

[Note that {NULL:xxx} just creates a set of everything before the xxx member, i.e. everything to date]

This works just fine, if the user selects the [Date].[Calendar] hierarchy. What if the user selects the [Date].[Fiscal] hierarchy, or the [Period] dimension? Basically the calculation wont work, as the MDX expression is only aware of the [Date].[Calendar] hierarchy.

The simple solution is to use the Aggregate function over all of the dimensions that the calculation needs to be aware of:

CREATE MEMBER CURRENTCUBE.[Measures].[Stock Level]
AS
  AGGREGATE(
      {NULL:[Date].[Fiscal].CurrentMember}
       * {NULL:[Date].[Calendar].CurrentMember}
       * {NULL:[Period].[Period].CurrentMember}
    , [Measures].[Stock Movement]
  );

The calculation will then use whichever date or time hierarchy is selected. It will even cope if multiple dimensions are selected, say the calendar on 0 and the periods on 1, both axis will honor the aggregation as expected.

Frog-Blog out.

© Alex Whittles, Purple Frog Systems Ltd

I’ve just read an interesting thread on the SQL Server Developer Center forum, regarding how to filter results. Specifically the difference in MDX between using a subselect

SELECT x on COLUMNS, y on ROWS FROM ( SELECT z on COLUMNS FROM cube))

or using a where clause

SELECT x on COLUMNS, y on ROWS FROM cube WHERE z

In a simple query they produce the same results, but what is the actual difference? You can read the full thread here, but to summarise Darren Gosbell’s response…

Using the WHERE clause sets the query context and consequently the CurrentMember. This then enables functions such as YTD and PerdiodsToDate to work.

Using a subselect can provide improved performance, but does not set the context.

Simples..!

© Alex Whittles, Purple Frog Systems Ltd

I was working on a new SSAS 2008 cube today, and came across an error which Google was unable to help with. I thought I’d post the solution here to help anyone else who may encounter it.

The cube in question will be primarily be accessed using Excel 2007, so I’d been dutifully testing it along the way to ensure all was well. And then, after a number of changes the following error appeared when connecting to the cube from Excel to create a pivot table.

Excel was unable to get necessary information about this cube. The cube might have been reorganized or changed on the server.

Contact the OLAP cube administrator and, if necessary, set up a new data source to connect to the cube

Connecting and querying the cube via SSMS or BIDS worked without error (hense I didn’t spot the error sooner!).

A quick Google revealed a number of posts regarding this error, but they all related to attributes containing invalid characters when accessed from Excel 2000 Or problems with translations and locale settings in the .oqy file. Neither of these was the cause here, so I had to go back and recreate every change I had made step by step to track the problem.

Well, I’m please to report that in the end it was nothing more that a simple spelling mistake in a named set. One of the dynamic named sets in the cube calculations referred to a specific member of a dimension, which was spelled slightly incorrectly. (Simplified example..)

CREATE DYNAMIC SET CURRENTCUBE.[Set1]
 AS {[Dimension].[Attribute].[Value1],
     [Dimension].[Attribute].[Value2WithTypo]
    };

When querying calculated measures through MDX in SSMS, the MDX parser just ignored the problem and only uses the valid members, however it appears as though Excel 2007 is slightly more picky with its cubes.

Useful to know, and even more useful when used as a tool to double check for any errors in the MDX calculations.

© Alex Whittles, Purple Frog Systems Ltd

I almost feel embarrassed…, I’ve been writing this blog for over 9 months now, and I have yet to mention Mosha, although in my defence, there is a link to his blog in the links section to the right.

As many/most of you may know, Mosha Pasumansky is one the key brains behind designing the MDX language and Analysis Services – nuff said?

Over the last year he has been working on a pet project, MDX Studio. It’s an MDX query tool which any self respecting OLAP developer should now be using on a regular basis. He has just released v0.4.6, which adds some really nifty features such as the dependency view.

If you’re just starting out with MDX, then the intellisense will be of massive benefit to you; even if you’re a seasoned pro, the performance monitoring is an essential tool on its own.

If you haven’t already tried it, have a look at Mosha’s blog, and get a copy – you won’t regret it.

And thanks for all your hard work Mosha – It’s much appreciated.

Alex

© Alex Whittles, Purple Frog Systems Ltd
The Frog Blog

I'm Alex Whittles.

I specialise in designing and implementing SQL Server business intelligence solutions, and this is my blog! Just a collection of thoughts, techniques and ramblings on SQL Server, Cubes, Data Warehouses, MDX, DAX and whatever else comes to mind.

I'm Organising SQL Relay 2014

Submit a session for SQLBits

Frog Blog Out
twitter
rssicon