I’ve noticed a growing trend over the last year – the ever growing presence of BIML (Business Intelligence Markup Language). So what is it? What does it do? And do you need to learn it?
What is BIML?
Simply, it’s a way of defining the functionality of an SSIS (Integration Services) package. If you’ve ever opened an SSIS .dtsx file in notepad you’ll see a daunting mess of GUIDs that you really don’t want to play around with. BIML is a simple XML format that allows you to write an SSIS package in notepad. When you run a BIML script it creates SSIS packages for you. These can then be opened and edited in BIDS exactly the same as an SSIS package that you’d created manually.
To show the difference, first of all this is a sample BIML script:
Then, when this is compiled into an SSIS package it looks like this in the front end:
But this when you open the .dtsx package in notepad:
The BIML script is a little easier to digest!
But why on earth would you want to do that, when you can just use the BIDS/Visual Studio GUI? The answer is C# and automation. You can mix C# code in with the BIML XML (in a similar way to PHP or old school ASP scripts). This allows you to have a single BIML script, which can apply itself to every item in a list, or every table in a database, and automatically generate all of your SSIS packages from a single template.
Yes, this is very cool stuff.
The following screenshot is the same script as above, but configured to loop through every table in the ‘dim’ schema of a data warehouse, creating a package that truncates the relevant dim table.
The C# script is highlighted in yellow for clarity.
With this, just running the script will create multiple SSIS packages at the click of a button.
How do you create and run a script?
Firstly you need BIDS Helper. But you should have that anyway.
Create a new Integration Services project, then right click on the project and click ‘Add New Biml File’
This will add a BIML script file into the Miscellaneous folder of the project.
Once you’ve written a script you can test it (right click on the script and select ‘Check Biml for Errors’, or you can run the script, generating the SSIS packages, by clicking ‘Generate SSIS Packages’.
So, do you need to learn BIML?
I have no doubt that BIML is the future of SSIS. Once you see the full power of if then you’ll never want to go back to manually coding packages again.
If you’re an SSIS pro then there’s a good chance that your next job will require BIML. Or if a potential employer doesn’t ask for it, you can certainly improve your chances of getting the job by selling it (and your skills) to them.
At Purple Frog, all of our SSIS development is now 90% automated using BIML, leaving us more time to focus on the 10% of work that need some custom tweaking or more enhanced logic.
What if you don’t like coding?
Well in that case, check out MIST from Varigence. It’s a GUI for BIML, and a lot more besides. If you’re going to be using BIML a lot then it may well be worth the investment.
Well after 3.5 years, I’ve finally completed my MSc Business Intelligence – hoorah! And to reward the time, effort and increased grey hair, they saw fit to give me a merit as well.
During the last year I’ve been writing a thesis investigating the performance characteristics of loading data into data warehouse dimensions. Specifically loading Type 2 SCDs using SSIS.
For those who have followed the previous posts and my conference talks on using T-SQL Merge for the purpose, you won’t be surprised at the direction of the dissertation, but it provides a useful performance comparison between T-SQL Merge, SSIS Merge Join, SSIS Lookup and the SSIS SCD Wizard.
I won’t go into the full details here of the project or results, but will show a couple of the summary charts which are of most interest. You can download the full project here:
- PDF: Performance comparison of techniques to load Type 2 slowly changing dimensions in a Kimball style data warehouse using SSIS
The charts below shows the duration taken for the Lookup, Merge and Merge-Join methods (SCD Wizard excluded for obvious reasons!).
The top chart shows the performance on a Raid 10 array of traditional hard disks.
The second chart shows the same tests run on a Fusion IO NAND flash card.
The charts clearly show that the Lookup method is the least favoured. Of the other two, Merge is [just] preferred when using solid state, although statistically they are equivalent. On HDDs, Merge and Merge-Join are equivalent until you’re loading 2-3m rows per batch, at which point Merge-Join becomes the preferred option.
Full test results and analysis in the PDF download above.
My previous few posts show how using a T-SQL approach like Merge can provide huge development benefits by automating the code. This research now shows that unless you’re loading very large data volumes the performance is equivalent to more traditional approaches.
Hope this is of use. If you want to know a bit more without reading the full 99 pages & 23k words (who could blame you?!), then my SQLBits talk video is now on-line here. This talk is slightly out of date as it was presented before I’d finished the research and analysis, but it’s largely accurate. I presented a more up to date version on a webinar for the PASS Virtual BI chapter. The recording isn’t currently available [When this post was written] but should be up soon. Keep checking on the BI PASS Chapter website.
SQLBits X Video Now available
The video of my talk at SQLBits X is now available on the SQLBits website here. The talk was focused on presenting the results of my MSc Business Intelligence dissertation, comparing the performance of different methods of using SSIS to load data warehouse dimensions, specifically type 2 SCDs.
The talk also covers a comparison of the performance between traditional hard disks and solid state storage systems such as Fusion IO.
I then present a method of using the T-SQL Merge statement to automate a significant part of the ETL process.
You can find the code behind the demos on various recent Frog-Blog posts, and there is more to come, so look back soon!
- Introduction to T-SQL Merge Basics
- Using T-SQL Merge to Load SCD Dimensions
- Automating T-SQL Merge to Load SCD Dimensions
PASS BI Virtual Chapter Talk
For those that would rather hear the talk presented live, or want to ask any questions, please join me at a repeat of this talk over Live Meeting for the PASS BI Virtual Chapter on Thursday 28th June, at 5pm UK time, 12pm EDT (US). You can find the details on the PASS BI chapter website here
This is the 3rd post in the Frog-Blog series on the awesomeness of T-SQL Merge.
- Post 1: Introduction to T-SQL merge basics
- Post 2: Using T-SQL merge to load data warehouse dimensions
In this post we’ll be looking at how we can automate the creation of the merge statement to reduce development time and improve reliability and flexibility of the ETL process. I discussed this in the 2nd half of a talk I gave at the UK technical launch of SQL Server 2012 at SQLBits X. Thank you to the great audience who came to that talk, this post is for your benefit and is a result of the feedback and requests from you guys.
Why automate merge?
As we saw in the previous post, merge is an incredibly powerful tool when loading data into data warehouse dimensions (specifically SCDs – slowly changing dimensions). The whole process can be wrapped up into a very neat stored proc which can save a considerable mount of time writing the equivalent functionality in SSIS. In the next installment of this series I’ll be discussing the performance of it compared to other methods of loading SCDs in SSIS (take a look at the SQLBits talk video [when it’s released] for a preview!). Suffice to say for now that in my [pretty comprehensive] tests it’s one of the fastest methods of loading SCDs.
If you missed the talk, you can download the slide deck here whilst you’re waiting for the video.
The problem that stops a lot of people using merge is the perceived complexity of the statement. It can be very easy to get things wrong, with pretty bad consequences on your dimension data.
The easiest way to avoid this complexity and simplify the process is to not write merge statements, but let an automated procedure to it for you – Simples!.
The other huge benefit is that, as we’ll see during this post, you can base the automation procedure on metadata, meaning that you can change the SCD functionality of your dimensions just by changing metadata, and not rewriting your code.
Note that in this post we’ll just be looking at Type 0 and 1 SCDs, not 2, 3 or 6. This is to keep things simple. Once you’ve mastered type 0 and 1, it’s a logical next step to expand things to deal with type 2s.
OK, so how do we do this?
First of all we need to set up two tables to use. Let’s create a simple Customer dimension. Alongside this we also need a staging table. I’m a big fan of using schemas to differentiate tables, so we’ll create dim.Customer and etl.Customer as our two tables.
CREATE SCHEMA [dim] AUTHORIZATION [dbo] GO CREATE SCHEMA [etl] AUTHORIZATION [dbo] GO CREATE TABLE [dim].[Customer]( [CustomerKey] [int] IDENTITY(1,1) NOT NULL, [Email] [varchar](255) NOT NULL, [FirstName] [varchar](50) NOT NULL, [LastName] [varchar](50) NOT NULL, [DoB] [date] NOT NULL, [Sex] [char](1) NOT NULL, [MaritalStatus] [varchar](10) NOT NULL, [FirstCreated] [date] NOT NULL, [IsRowCurrent] [bit] NOT NULL, [ValidFrom] [datetime] NOT NULL, [ValidTo] [datetime] NOT NULL, [LastUpdated] [datetime] NOT NULL CONSTRAINT [PK_DimCustomer] PRIMARY KEY CLUSTERED ( [CustomerKey] ASC )) GO CREATE TABLE [etl].[Customer]( [Email] [varchar](255) NOT NULL, [FirstName] [varchar](50) NOT NULL, [LastName] [varchar](50) NOT NULL, [DoB] [date] NOT NULL, [Sex] [char](1) NOT NULL, [MaritalStatus] [varchar](10) NOT NULL, [FirstCreated] [date] NOT NULL )
So the dim table contains our primary surrogate key, business key (email address in this case), customer details and a series of audit fields (IsRowCurrent, ValidFrom, etc.). The etl staging table only contains the business key and customer details.
We then need to store the details of each field. i.e. how should each field be interpreted – is it a primary key, business, key, type 0 or 1, or an audit field. We need this so that we can put the correct fields into the correct place in the merge statement. You could create a table to store this information, however I prefer to use the extended properties of the fields.
EXEC sys.sp_addextendedproperty @level2name=N'CustomerKey', @value=N'PK' , @name=N'SCD', @level0type=N'SCHEMA',@level0name=N'Dim', @level1type=N'TABLE',@level1name=N'Customer', @level2type=N'COLUMN' EXEC sys.sp_addextendedproperty @level2name=N'Email', @value=N'BK' , @name=N'SCD', @level0type=N'SCHEMA',@level0name=N'Dim', @level1type=N'TABLE',@level1name=N'Customer', @level2type=N'COLUMN' EXEC sys.sp_addextendedproperty @level2name=N'FirstName', @value=N'1' , @name=N'SCD', @level0type=N'SCHEMA',@level0name=N'Dim', @level1type=N'TABLE',@level1name=N'Customer', @level2type=N'COLUMN' EXEC sys.sp_addextendedproperty @level2name=N'LastName', @value=N'1' , @name=N'SCD', @level0type=N'SCHEMA',@level0name=N'Dim', @level1type=N'TABLE',@level1name=N'Customer', @level2type=N'COLUMN' EXEC sys.sp_addextendedproperty @level2name=N'DoB', @value=N'1' , @name=N'SCD', @level0type=N'SCHEMA',@level0name=N'Dim', @level1type=N'TABLE',@level1name=N'Customer', @level2type=N'COLUMN' EXEC sys.sp_addextendedproperty @level2name=N'Sex', @value=N'1' , @name=N'SCD', @level0type=N'SCHEMA',@level0name=N'Dim', @level1type=N'TABLE',@level1name=N'Customer', @level2type=N'COLUMN' EXEC sys.sp_addextendedproperty @level2name=N'MaritalStatus',@value=N'1' , @name=N'SCD', @level0type=N'SCHEMA',@level0name=N'Dim', @level1type=N'TABLE',@level1name=N'Customer', @level2type=N'COLUMN' EXEC sys.sp_addextendedproperty @level2name=N'FirstCreated', @value=N'1' , @name=N'SCD', @level0type=N'SCHEMA',@level0name=N'Dim', @level1type=N'TABLE',@level1name=N'Customer', @level2type=N'COLUMN' EXEC sys.sp_addextendedproperty @level2name=N'ValidFrom', @value=N'Audit' , @name=N'SCD', @level0type=N'SCHEMA',@level0name=N'Dim', @level1type=N'TABLE',@level1name=N'Customer', @level2type=N'COLUMN' EXEC sys.sp_addextendedproperty @level2name=N'ValidTo', @value=N'Audit' , @name=N'SCD', @level0type=N'SCHEMA',@level0name=N'Dim', @level1type=N'TABLE',@level1name=N'Customer', @level2type=N'COLUMN' EXEC sys.sp_addextendedproperty @level2name=N'IsRowCurrent', @value=N'Audit' , @name=N'SCD', @level0type=N'SCHEMA',@level0name=N'Dim', @level1type=N'TABLE',@level1name=N'Customer', @level2type=N'COLUMN' EXEC sys.sp_addextendedproperty @level2name=N'LastUpdated', @value=N'Audit' , @name=N'SCD', @level0type=N'SCHEMA',@level0name=N'Dim', @level1type=N'TABLE',@level1name=N'Customer', @level2type=N'COLUMN'
Or you can obviously just enter the extended property manually using SSMS
The SSIS package should output all customer records into the etl table, with no regard for whether they are new customers, old customers, changed or not. The merge statement will take care of that.
The first stage is to examine the structure of merge.
MERGE [DIMENSION TABLE] as Target USING [STAGING TABLE] as Source ON [LIST OF BUSINESS KEY FIELDS] WHEN MATCHED AND Target.[LIST OF TYPE 1 FIELDS] <> Source.[LIST OF TYPE 1 FIELDS] THEN UPDATE SET [LIST OF TYPE 1 FIELDS] = Source.[LIST OF TYPE 1 FIELDS] WHEN NOT MATCHED THEN INSERT [LIST OF ALL FIELDS] VALUES Source.[LIST OF ALL FIELDS]
The text in black is the skeleton of the statement, with the text in red being the details specific to the dimension. It’s these red items which we need to retrieve from the metadata of the dimension in order to create the full merge statement.
We can retrieve the extended properties using the sys.extended_properties DMV. This allows us to pull out a list of all fields which have a specific extended property set, e.g. all PK fields, all BK fields, all type 2 fields etc. etc. If we then put a few of these queries into cursors, we can loop through them and build up a dynamic SQL query. Yes I know, dynamic SQL should be avoided and is evil etc., however… this use is an exception and does truly make the World a better place.
I’m not going to explain the resulting proc in minute detail, so instead please just download it here and work through it yourself. I will however explain a couple of items which are pretty important:
It’s important to keep the naming convention of your dimensions consistent. This doesn’t mean that every dimension must be identical, some may need inferred member support, some may need type 2 tracking fields (e.g. IsRowCurrent) and some may not; the critical thing is that all of your fields, if they do exist, should be named consistently. The automation proc can then look for specific field names and include them in the merge statement if necessary.
There is a parameter in the proc called @Execute. This offers the possibility of either executing the resulting merge statement directly, or just printing out the statement. If you only want to use this to automate the development process then this allows you to do just that, you can then just copy and paste the resulting statement into SSIS or into a stored proc.
The automated generation of T-SQL merge statement to handle type 0 & 1 SCDs!
Hopefully you can see how you can expand this to also cope with Type 2 SCDs, following the structure in my earlier posts.
Download the SQL scripts here
In my last blog post I showed the basic concepts of using the T-SQL Merge statement, available in SQL Server 2008 onwards.
In this post we’ll take it a step further and show how we can use it for loading data warehouse dimensions, and managing the SCD (slowly changing dimension) process. Before we start, let’s have a quick catch up on what an SCD is…
What is a Slowly Changing Dimension (SCD)?
If you want a full explanation of slowly changing dimensions then you’ve come to the wrong place, I’m assuming a moderate level of experience of SCDs here, check out Wikipedia for some background, but in short, they manage the tracking of attribute history in dimensional data warehouses.
Most data warehouses contain type 0, 1 and 2 SCDs, so we’ll cope with those for now.
- Type 0 – Ignore updates
- Type 1 – Only keep latest version
- Type 2 – Track history by creating a new row
Type 2 is commonly stored in a fashion similar to this.
Both records show the same customer but in this case Jane got married and changed her name & title. We terminate the old record by setting IsRowCurrent=0 and create a new record with the new details. Each row also contains ValidFrom and ValidTo dates which allow us to identify the correct record for a particular point in time.
That’s enough of that, let’s get on with doing this using Merge
Using Merge to load SCD
The first stage is to save the output rows from the ETL process to a staging table. We can then use Merge to process these into the live dimension.
We saw in the previous post how to either insert or update a record depending on whether it already exists. We can start with this and enhance as we go. First lets figure out what logic we want to perform
- If the record doesn’t exist, create it
- If the record does exist
- Type 0 fields – ignore
- Type 1 fields – update fields
- Type 2 fields – terminate existing record, insert a new record
- If the record exists in the dimension, but not in the updated source file – terminate record
The last option is rarely used in my experience, as it only works when you perform a full load of the dimension every time. It’s more common to process an incremental load, but I’ve included it here for completeness.
The main difference here, over a basic upsert, is the handling of type 2s; we need to perform two separate operations on the dimension for every incoming record. Firstly we terminate the existing row then we have to insert a new row.
The T-SQL Merge statement can only update a single row per incoming row, but there’s a trick that we can take advantage of by making use of the OUTPUT clause. Merge can output the results of what it has done, which in turn can be consumed by a separate INSERT statement.
We’ll therefore use the MERGE statement to update the existing record, terminating it, and then pass the relevant source rows out to the INSERT statement to create the new row.
Let’s look at an example. Download the code here which will create the necessary tables and data to work on.
Main Merge Statement
We’ll start with a statement very similar to the previous post, with only a couple of minor amendments:
- We include IsRowCurrent into the joining clause. We only ever want to update the current records, not the history.
- DoB is removed from the WHEN MATCHED clause. We’re going to treat DoB as a type 1 change, if it’s updated then we assume it’s a correction rather than a new date of birth which should be tracked. We’ll deal with this Type 1 later on
- The UPDATE statement in the WHEN MATCHED clause doesn’t change the fields, only terminates the row by setting the IsRowCurrent and ValidTo fields (as well as LastUpdated)
MERGE Customer AS [Target] USING StagingCustomer AS [Source] ON Target.Email = Source.Email AND Target.IsRowCurrent = 1 WHEN MATCHED AND ( Target.FirstName <> Source.FirstName OR Target.LastName <> Source.LastName OR Target.Title <> Source.Title ) THEN UPDATE SET IsRowCurrent = 0 ,LastUpdated = GETDATE() ,ValidTo = GETDATE() WHEN NOT MATCHED BY TARGET THEN INSERT ( FirstName ,LastName ,Title ,DoB ,Email ,LastUpdated ,IsRowCurrent ,ValidFrom ,ValidTo ) VALUES ( Source.FirstName ,Source.LastName ,Source.Title ,Source.DoB ,Source.Email ,GETDATE() --LastUpdated ,1 --IsRowCurrent ,GETDATE() --ValidFrom ,'9999-12-31' --ValidTo ) WHEN NOT MATCHED BY SOURCE AND Target.IsRowCurrent = 1 THEN UPDATE SET IsRowCurrent = 0 ,LastUpdated = GETDATE() ,ValidTo = GETDATE()
The ‘When Matched’ section includes extra clauses which define which fields should be treated as Type 2.
The ‘When Not Matched By Target’ section deals with inserting the new records which didn’t previously exist.
The ‘When Not Matched By Source’ section deals with terminating records which are no longer received from the source. Usually this section can be deleted, especially if the data is received incrementally.
*** UPDATE *** Thank you to Sergey (in the comments below) for pointing out an error in this code. I’ve now corrected the ‘WHEN NOT MATCHED BY SOURCE’ line to include ‘AND Target.IsRowCurrent=1′. If this is omitted then all historic (IsRowCurrent=0) records are always updated with today’s date. We only want to terminate current records, not already terminated records.
We then add an OUTPUT clause to the end of the statement
OUTPUT $action AS Action ,Source.*
The OUTPUT clause tells MERGE to generate an output dataset. This can consist of any of the Source table’s fields or the Target table’s fields. We can also specify $Action as an extra field which will identify, for each row, whether
it was dealt with via an INSERT, UPDATE or DELETE. For this purpose we only care about the UPDATES, so we’ll use this to filter the records later on. We also only need the Source data, not the Target, so we’ll return Source.*
We wrap this up within an INSERT statement which will insert the new record for the changed dimension member.
INSERT INTO Customer ( FirstName ,LastName ,Title ,DoB ,Email ,LastUpdated ,IsRowCurrent ,ValidFrom ,ValidTo ) SELECT FirstName ,LastName ,Title ,DoB ,Email ,GETDATE() --LastUpdated ,1 --IsRowCurrent ,GETDATE() --ValidFrom ,'9999-12-31' --ValidTo FROM ( MERGE Customer AS [Target] USING StagingCustomer AS [Source] ON Target.Email = Source.Email AND Target.IsRowCurrent = 1 WHEN MATCHED AND ( Target.FirstName <> Source.FirstName OR Target.LastName <> Source.LastName OR Target.Title <> Source.Title ) THEN UPDATE SET IsRowCurrent = 0 ,LastUpdated = GETDATE() ,ValidTo = GETDATE() WHEN NOT MATCHED BY TARGET THEN INSERT ( FirstName ,LastName ,Title ,DoB ,Email ,LastUpdated ,IsRowCurrent ,ValidFrom ,ValidTo ) VALUES ( Source.FirstName ,Source.LastName ,Source.Title ,Source.DoB ,Source.Email ,GETDATE() --LastUpdated ,1 --IsRowCurrent ,GETDATE() --ValidFrom ,'9999-12-31' --ValidTo ) WHEN NOT MATCHED BY SOURCE AND Target.IsRowCurrent = 1 THEN UPDATE SET IsRowCurrent = 0 ,LastUpdated = GETDATE() ,ValidTo = GETDATE() OUTPUT $action AS Action ,[Source].* ) AS MergeOutput WHERE MergeOutput.Action = 'UPDATE' AND Email IS NOT NULL ;
Note that the output clause is restricted so we only return the ‘UPDATE’ rows. As we’re using the email field as the business key, we should also ensure that we only insert records which have a valid email address.
So Type 2 changes have now been dealt with, by terminating the old version of the record and inserting the new version. Type 0 fields are just left out of the entire process, so are taken care of by just ignoring them. Therefore the only thing left is to manage the Type 1 fields.
We have two options here;
- Update all historical records to the new value
- Update only the current record to the new value
These are obviously only valid when there is a mix of type 1 and 2 attributes. If we’re just looking at Type 1 then there will be no historical records. In a true Type 1 scenario the first option is correct. All history (of Type 1 fields) is lost. The second option can be a valid option when it would be beneficial to keep a limited history of Type 1 fields.
This would mean that historical records created by Type 2 changes also keep a record of the Type 1 attribute values that were valid at the time the record was terminated. It doesn’t keep a full history of Type 1 attributes but sometimes this can be useful.
UPDATE C SET DoB = SC.DoB ,LastUpdated = GETDATE() FROM Customer C INNER JOIN StagingCustomer SC ON C.Email = SC.Email --AND C.IsRowCurrent = 1 --Optional AND C.DoB <> SC.DoB
This block of code updates the Type 1 attributes (in this case, DoB). The line 7 (the IsRowCurrent) check is optional depending on whether you only want to update current or all records.
So in one SQL statement we’ve managed the entire load process of all Type 2 SCDs, and with one more we’ve also managed all Type 1 fields.
I’ve been performing a large number of performance tests on loading Type 2s using various methods (another blog post to follow, as well as a talk that I’ll be presenting at SQL Bits X), and the performance of this method is very fast. In fact there’s very little difference in performance between using this method and using the SSIS Merge Join component.
This is now my preferred approach to loading Type 2 SCDs, slightly faster methods may be available, but as we’ll see in later blog posts, this is such a quick method to implement, as well as being incredibly flexible as it can be controlled entirely from metadata.
Long live the Merge statement!
PowerPivot Vs QlikView 101 – SQLBits Video
The video is now available from my PowerPivot and QlikView talk at SQLBits 9 in Liverpool in September 2011. You can download or watch the video here.
In this 1 hour session I create interactive dashboards from scratch in both PowerPivot and QlikView, showing how to set up the data model, overcome common pitfalls and build the dashboards. I create almost identical dashboards in both systems and highllight the pros and cons of each system.
You can find the scripts and code to go with this video in this blog post
SQLBits 10 is coming!!
Today the SQLBits organisers have announced that SQLBits 10 will be held in London between 29th – 31st March 2012 in the Novotel London West. It’s going to be even bigger and better, so keep an eye on the website and get your tickets early.
Thank you to the SQLBits committee, speakers, helpers and attendees, for making SQLBits 9 one of the best yet. What a great example SQLBits is of the power of the SQL Server community coming together to help each other, and enjoy a few beers in the process!
I was delighted to have my “PowerPivot & QlikView 101″ session chosen by the committee, which I presented on Saturday afternoon. Thank you to all those who attended, great to see such a good number there. If you haven’t yet submitted your feedback for my session, and the event as a whole, please do take the time to do it. It helps everyone improve the quality of the event for next time.
For those who atttended my session, here are the slides, with a summary of the results and code etc.
QlikView Script: Download the QlikView Script here
PowerPivot Workboook: Download the PowerPivot workbook here
Session Video: The session recording isn’t yet available, I’ll post a link here when it’s ready.
Sales Amount Visual Cue for the green shading
Upper: (SUM(TOTAL SalesAmount) / COUNT(TOTAL SalesTerritoryRegion))*1.2
Lower: (SUM(TOTAL SalesAmount) / COUNT(TOTAL SalesTerritoryRegion))*0.8
Sales Amount blue bar Guage maximum
Sales Amount Year to Date
Sales Amount Previous Year
PowerPivot DAX Expressions
Sales Amount Year to Date
=TOTALYTD(SUM(FactInternetSales[SalesAmount]), DimOrderDate[FullDateAlternateKey], ALL(DimOrderDate))
Sales Amount Previous Year
=CALCULATE(SUM(FactInternetSales[SalesAmount]), SAMEPERIODLASTYEAR(DimOrderDate[FullDateAlternateKey]), ALL(DimOrderDate))
If you have any questions about any of this, please get in touch.
It only seems like yesterday since the SQLBits crew put on their awesome 8th conference in Brighton. The next conference has just been announced, SQLBits will move to Liverpool for SQLBits 9 – “Query Across The Mersey” (…Really?!) between September 29th and October 1st.
Registration’s not yet open, but you can find out more at www.SQLBits.com
I’ve already submitted two sessions to present, “PowerPivot & QlikView 101″ and “Advanced data modelling for data warehousing and cubes”. Make sure you vote for them when voting opens! You can read full summaries of all sessions submitted so far here.
SQL User Groups
We’ve also got a busy time coming up with the UK SQL Server user groups. I’ll be running the next Birmingham user group the week after SQLBits, on Monday 3rd October. The agenda and speakers have yet to be set, but it should be a goodun!
I’ll also be speaking at the Southampton user group in 2 weeks time, on Wednesday 6th July. I’ll be presenting my 3 part blog post on automating SSAS cube documentation. Matt Whitfield will also be presenting, so it should be a good night. Now stop reading this and go and register!!
Automating OLAP cube documentation – SQLBits presentation
For anyone that missed my presentation at SQLBits 8 in April, the video is now available here.
In this 1 hour session I present a method of automating the creation of documentation for SSAS OLAP cubes by using DMVs (dynamic management views) and spatial data, querying the metadata of the cube in realtime.
The results include the BUS matrix, star schemas, attribute lists, hierarchies etc. and are all presented in SSRS.
You can view the slide deck here
I’m a happy chap. Why? Because I read a blog post yesterday by T.K. Anand (SSAS Principal Group Program Manager) about the vision and roadmap of Analysis Services.
There were slightly concerning questions last November (following the PASS conference) surounding the future of Analysis Services, or more specifically the UDM, the dimensional model that we all know and love in SSAS 2005 & 2008. The arrival of PowerPivot into the Microsoft BI arsenal has without a doubt moved the goalposts and added significant power, flexibility and usability to the BI stack. My concern, along with others (most notibly Chris Webb, who sparked somewhat of a stampede on the issue), was for the future of the UDM and the multitude of existing dimensional systems out in the field. Is the dimensional approach being phased out? Will it be supported in future editions? Will it be enhanced? Will the future BISM support the complexity and power we currently have with the UDM?
There’s no doubt that the overall approach to business intelligence is evolving. And this isn’t just in the cube space, it obviously has a direct effect on all other aspects of the BI strategy; the data warehouse, reporting layer, ETL etc.
From a BI consultant’s point of view, I don’t want to be recommending tools to my clients which have a restricted life span and don’t provide them a future proof upgrade path.
From a technology perspective, I’m a hardened supporter of the dimensional model. I recently designed a complex cube system for a banking client which had over 150 dimensions and facts, with thousands of lines of MDX to create a very sophisticated calculation framework for their liquidity modelling and loan profiling. I wouldn’t dream of doing that in a tabular approach like PowerPivot (in their current form).
From a personal point of view, where do I focus my attention in terms of training, research, blogging, user groups, conference sessions etc. etc.
I should point out that I’m very excited by, and fully committed to the tabular/PowerPivot route (along with VertiPaq, Crescent, DAX, etc.) for systems that it is suited to. In fact I’m using it right now to prototype a global BI solution for a very large client. There are however some solutions that do not fit well with the tabular approach and are best suited to a dimensional approach. I’m in favour of a hybrid framework which allows the right tool to be used for the right system. And it looks like that’s what we’re going to get.
The guys at Microsoft have now evolved and clarified the roadmap, and have confirmed that the BISM (business intelligence semantic model, i.e. The core of Analysis Services in SQL Server Denali) will contain two parallel approaches that can both be used for whichever situations they are best suited to. More importantly, they are both here to stay, will both be developed further, and there will be a cross-availability of functionality and tools between them.
Multi Dimensional Model
Essentially the same as the existing UDM, the multi-dimensional data model will support MDX and ROLAP/MOLAP data access. Existing OLAP cubes in SQL 2008 will easily upgrade to this.
Think of this as hosted PowerPivot. A tabular approach with a column based data store, DAX as the expression language and either VertiPaq or Direct Query for data access.
The two will co-exist side by side within a singluar BISM, albeit initially with a degree of seperation. In the upcoming CTP2 release (July 2011?) there will not be any cross-availability of functionality, i.e. VertiPaq, Crescent and DAX will not be available to the dimensional model. However TK makes it clear that this is a short term restriction in the CTP, and that Microsoft are commited to getting this cross availability in place in the finished product.
If you’re involed in BI in any way, I really do encourage you to go and read TK’s post in detail. The Business Intelligence world is changing. I now have total confidence that it’s for the better.