When you have very large dimensions in SQL Server Analysis Services (SSAS) cubes, the time taken to process them can cause a problem. This post builds upon an excellent walkthrough of the ProcessAdd option written by Daniel Calbimonte on MSSQLTips.com. and shows how to automate the identification of new data to be added.

What are the main options for processing a dimension?  (Technet details here)

  • Process Full – the whole dimension is dropped and rebuilt
  • Process Update – a new copy of the dimension is processed in full and the results compared, with changes being copied to the original dimension
  • Process Add – just add new rows to the dimension, don’t do anything with existig rows

There are lots of posts around the internet that discuss the pros and cons of each of these, so I won’t go into the details. But as the title suggests, we’ll focus on the ProcessAdd option.

How do you implement ProcessAdd?

Daniel provides a fantastic explanation and code walkthrough of how to use SSIS (Integration Services) to implement a ProcessAdd solution, please read through his post before continuing. He talks us through using an SSIS data flow to create a source query to retrieve the new rows, and then the Dimension Processing component as a destination to receive the new rows.

ProcessAdd01

And this works really well. The problem then becomes how to identify which rows need to be added. This is a problem that Daniel overcomes by having a seperate table of records to be added. The source query just selects the entire table. This table could be populated by the ETL logic, a trigger, or any number of other processes. But can we improve upon this and make the proccess more seamless and transparent?

Getting all MDX

For me, the best way of enhancing this is to build a solution which self maintains. So, we need to look inside the cube dimension, find the last record that has already been added, and then use that to find any new records from the warehouse.

Step 1 – Find the maximum key for a dimension

 WITH MEMBER [Measures].[MaxKey] AS
   MAX([Source Currency].[Source Currency Code].ALLMEMBERS
   , StrToValue([Source Currency].[Source Currency Code].currentmember.MEMBER_KEY))
 SELECT
   {[Measures].[MaxKey]} ON 0
 FROM
   [Adventure Works]

The MDX above creates a calculated member called MaxKey, which finds the maximum Key (the surrogate key identifier for the dimension) across all members of the dimension. Note that we have to use StrToValue() to convert the key from a string to an integer so that it sorts numerically not alphabetically. If you run this in SSMS against the AdventureWorks2012DW database the result should be 105. You can verify this by querying the DimCurrency table in the warehouse and you’ll find that 105 is the largest CurrencyKey value.

ProcessAdd02

Step 2 – How do we use MDX within SSIS?

In the Control Flow of the SSIS package, add an ‘Execute SQL’ task. Connect it to a new OLE DB connection to Analysis Services, as per the following screenshot.

ProcessAdd03

 

Set the SQL Statement of the Data Source to the MDX query, and set the ResultSet property to ‘Single Row’.

ProcessAdd04

On the Result Set tab, set the [Measures].[MaxKey] result to be saved to a new variable; ‘User::MaxKey’ (An Int32).

ProcessAdd05

The Execute SQL task should precede the existing Data Flow Task.

ProcessAdd08

Step 3 – Dynamically create the SQL Query from the MDX results

Now we have the maximum Key value stored in a variable, we can create another variable, called ‘SQL’ (a string), with an expression which includes the Key value.

ProcessAdd06

The expression should be

 "SELECT  CurrencyKey, CurrencyAlternateKey, CurrencyName
  FROM    DimCurrency
  WHERE   CurrencyKey>" +  (DT_WSTR,20)@[User::MaxKey]

This builds a SQL statement dynamically, based upon the maximum key identified from the MDX statement.

We can then set the source query in the Data Flow to use the resulting SQL Query, by setting its ‘Data Access Mode’ to ‘SQL Command From Variable’, and the ‘Variable Name’ to ‘User::SQL’.

ProcessAdd07

And that’s it. This process will automatically find any new dimension members in the warehouse table, and add them into the cube dimension.

This process does rely on a number of things – so watch out…!

  • The dimension must only contain inserts, not updates or deletes (you’ll need to perform a ProcessUpdate for those)
  • The Key must always be incrementing – new records with a smaller key will not get picked up!
  • You’ll need to ignore duplicate key errors during processing. Plenty of discussion around why in other blog posts…
  • SSIS is not great at handling MDX connections – unless you want to go down the linked server route, only ever have a single MDX query per SSIS package.

And finally, another shout out to Daniel Calbimonte and MSSQLTips for providing the basis of this post.

Frog-Blog-Out

 

Tags: ,