I’m a happy chap. Why? Because I read a blog post yesterday by T.K. Anand (SSAS Principal Group Program Manager) about the vision and roadmap of Analysis Services.

There were slightly concerning questions last November (following the PASS conference) surounding the future of Analysis Services, or more specifically the UDM, the dimensional model that we all know and love in SSAS 2005 & 2008. The arrival of PowerPivot into the Microsoft BI arsenal has without a doubt moved the goalposts and added significant power, flexibility and usability to the BI stack. My concern, along with others (most notibly Chris Webb, who sparked somewhat of a stampede on the issue), was for the future of the UDM and the multitude of existing dimensional systems out in the field. Is the dimensional approach being phased out? Will it be supported in future editions? Will it be enhanced? Will the future BISM support the complexity and power we currently have with the UDM?

There’s no doubt that the overall approach to business intelligence is evolving. And this isn’t just in the cube space, it obviously has a direct effect on all other aspects of the BI strategy; the data warehouse, reporting layer, ETL etc.


From a BI consultant’s point of view, I don’t want to be recommending tools to my clients which have a restricted life span and don’t provide them a future proof upgrade path.

From a technology perspective, I’m a hardened supporter of the dimensional model. I recently designed a complex cube system for a banking client which had over 150 dimensions and facts, with thousands of lines of MDX to create a very sophisticated calculation framework for their liquidity modelling and loan profiling. I wouldn’t dream of doing that in a tabular approach like PowerPivot (in their current form).

From a personal point of view, where do I focus my attention in terms of training, research, blogging, user groups, conference sessions etc. etc.


I should point out that I’m very excited by, and fully committed to the tabular/PowerPivot route (along with VertiPaq, Crescent, DAX, etc.) for systems that it is suited to. In fact I’m using it right now to prototype a global BI solution for a very large client. There are however some solutions that do not fit well with the tabular approach and are best suited to a dimensional approach. I’m in favour of a hybrid framework which allows the right tool to be used for the right system. And it looks like that’s what we’re going to get.

The guys at Microsoft have now evolved and clarified the roadmap, and have confirmed that the BISM (business intelligence semantic model, i.e. The core of Analysis Services in SQL Server Denali) will contain two parallel approaches that can both be used for whichever situations they are best suited to. More importantly, they are both here to stay, will both be developed further, and there will be a cross-availability of functionality and tools between them.

Multi Dimensional Model

Essentially the same as the existing UDM, the multi-dimensional data model will support MDX and ROLAP/MOLAP data access. Existing OLAP cubes in SQL 2008 will easily upgrade to this.

Tabular Model

Think of this as hosted PowerPivot. A tabular approach with a column based data store, DAX as the expression language and either VertiPaq or Direct Query for data access.


The two will co-exist side by side within a singluar BISM, albeit initially with a degree of seperation. In the upcoming CTP2 release (July 2011?) there will not be any cross-availability of functionality, i.e. VertiPaq, Crescent and DAX will not be available to the dimensional model. However TK makes it clear that this is a short term restriction in the CTP, and that Microsoft are commited to getting this cross availability in place in the finished product.

If you’re involed in BI in any way, I really do encourage you to go and read TK’s post in detail. The Business Intelligence world is changing. I now have total confidence that it’s for the better.

Frog-blog-out

This post follows on from an earlier post on drawing with SQL Server, and explains how to create much more complex drawings using a couple of neat tricks and SQL Server spatial data.

Firstly, apologies to those at my session at SQL Bits to whom I promised this blog post, I did say I’d try and get it posted in a week, and it’s been a month – but it’s here now!


So, what are we trying to do? In my earlier post I demonstrated how to recreate a block drawing by tracing around the points on the edges and converting the results to SQL spatial data coordinates. This is ok if the image is a simple logo, but what if it’s too complex like a photo or sketch? It would take an age to trace so we need a more automated approach.

At this point I’ll make my second apology, to Simon Sabin, who must by now think that Alastair Aitchison and I are stalking him. This post (and my associated lightning talk at SQLBits) derived from finding that Alastair had drawn the SQLBits logo over a month before I did mine. Feeling a little dejected I needed a new project. Simon set us a challenge of improving on it. One thing led to another and both Alastair and I started drawing portraits of Simon, and here we are.


So, lets start with the picture. I chose Simon’s twitter profile pic.

The first step is to convert the bitmap image into a vector image. A bitmap image is a collection of dots whereas a vector image is a collection of lines, better suited to drawing in SQL Server. There’s a great website that takes care of this for you, vectormagic.com and you get two free conversions to try it out. Upload the image, and then download the converted file in EPS format. EPS is ideal for our purposes as it’s a simple text file containing one instruction per line. We can then convert each line into a SQL spatial line.

I found the easiest way of converting the EPS lines into SQL spatial queries is with an Excel spreadsheet (download here). Paste the full contents of the EPS file into column A of the first sheet, the expressions in columns B to N strip out the coordinates and build them into SQL “geometry::STGeomFromText” commands.

You may notice that we’re converting the ‘curveto’ commands in the EPS file into ‘linestring’ commands in SQL. This does result in an approximation of the curves, but this is barely noticeable in drawings like this. Worth noting that SQL Denali is planned to support curves, so we will be able to make it more accurate.


Filter column O to remove the blanks (using the filter in cell O1), then copy the whole column to the clipboard. Then paste it into SQL Server Management Studio.

Then we have to do a little tweaking to make it work.


First we need to define a table variable at the very top of the query window. The first few lines of the query should look like this.

DECLARE @drawing TABLE (geom geometry);

INSERT INTO @drawing (geom) VALUES

(geometry::STGeomFromText(‘LINESTRING (0.00 960.00, 640.00 960.00)’,0))
,(geometry::STGeomFromText(‘LINESTRING (640.00 960.00, 640.00 800.00)’,0))

We can only insert 1000 rows at a time into a table variable, so we need to break up the insert statement up. Every 900 rows or so add a new INSERT statement such as

,(geometry::STGeomFromText(‘LINESTRING (199.39 416.98, 199.07 422.49)’,0))
,(geometry::STGeomFromText(‘LINESTRING (252.58 421.98, 252.87 424.99)’,0))

INSERT INTO @drawing (geom) VALUES

(geometry::STGeomFromText(‘LINESTRING (256.22 430.24, 257.03 426.03)’,0))
,(geometry::STGeomFromText(‘LINESTRING (256.19 417.52, 258.07 416.85)’,0))


Then at the end, we just need to select all rows from the table variable

SELECT * FROM @drawing


If all goes well, you should end up with something like this



You can download the finished SQL script here

SQL Server art at work!

Well what a few days. Everyone I spoke to agreed that SQLBits 8 has been the best SQLBits conference yet. The organisers did a fantastic job, the whole event seemed to run incredibly smoothly and was very well received. I’m sure there’s a certain element of the swan effect, with frantic paddling behind the scenes, but they pulled it off very well indeed. A massive thanks for all of your effort.

For me, this was the first SQLBits conference that I’ve presented at, and I have to say it made it even better for me. I’ve got so much out of SQLBits over the years it’s really nice to be able to give something back and contribute to the event. I thoroughly enjoyed presenting my sessions, and will definitely get a session or two lined up for the next conference.

I took a few photos over the 3 days, and thought I’d share them to give you a feel of the event if you couldn’t make it.


The venue: Brighton

And we couldn’t have asked for better weather. Just fabulous.

The hotel, The Brighton Grand.

The general consensus seems to have been that the best SQLBits conferences so far have been Celtic Manor and Brighton. To me, this is because both were held in hotels, which brings the social and work elements of the conference together.

The first day: SQLBits Insight

This is the first time the ‘Insight’ day has been held, and is aimed at more of a non-technical audience; managers, decision makers, architects etc.

I thought it was a great success, and was far better attended than I expected, with every seat full and standing room only. Great to see.

The panel discussion.

Some really interesting topics covered, with a very experienced panel. Mark Souza (SQL Server General Manager) and Guy Lucchi (CTO for the NHS project at CSC) were especially interesting. If anyone tries telling you that SQL Server doesn’t scale, tell them to get in touch with Guy – the sheer scale of the NHS project defies belief, and is all SQL Server…

The Woz!

One of the highlights for me was Steve Wozniak, co-founder of Apple. He gave a fascinating talk about his history with electronics, and how it has led him to be involved with the Fusion IO team. If you’ve not heard of Fusion IO, go and look them up, they have some awesome kit for insanely fast storage systems. Very impressive.

Not just fast storage space…

Love this photo, tweeted by Aaron Bertrand. Is this the world’s most expensive iPhone charger?!

The Games.

On Thursday night, the Fusion IO guys put on the best SQLBits party yet. Not only can they give you really impressive IO stats, they can also give you a really impressive hangover… One of the highlights was the bucking bronco.

This is Nigel from Barclays Loans getting thrown off with an element of style! He won an X-Box for his troubles though!


Update!

Nigel has just been in touch to clarify a point… He did of course give his XBox to Kath (from innocent drinks) straight away as an act of gallantry. As he wonderfully put it: “All I got out of it is nasty bruising, a faulty thumb, and a good story”


And who says chivalry is dead?! Most admirable Nigel, you put the rest of us to shame as there’s no way I’m giving up my iPod!
 

Woz signed iPod!

I couldn’t resist having a go myself, and won an iPod signed by Steve Wozniak! Result.

The mandatory late night greasy spoon. Well, you need something to soak up the free bar!

One of my favourite parts of the conference is the people. It’s great to meet up with old colleagues, friends and clients, as well as putting faces to the twitterati and of course meeting some great new people.
The SQLCAT team.

Great to have Thomas Kejser and co from the SQL Customer Advisory Team. These guys know their stuff!

I have to confess, I did manage to sneak off for a few minutes and enjoy an ice cream on the beach… It would have been rude not to, I’m sure you’ll agree…
Friday night fun and games began with a game of giant Connect 4. I’m proud to say I managed a 4 game unbeaten run before having to quit on a high and head off to the speakers and sponsors meal.
The speakers and sponsors meal on Friday night
My conference room!

I gave two talks on the Saturday, one ‘lightning’ talk and one full hour session, both in this room. The lightning talks were a great success, lots of speakers both experienced and new, presenting a brief 5-15 minute session on something interesting. I was pleasantly surprised by the great turnout to these sessions, I hope they do more at the next SQLBits.

It’s not all about learning SQL stuff…

Jamie Thompson did a great job of recording some great impromptu videos which he’s uploaded to YouTube. One of his more amusing and random videos is of Allan Mitchell explaining the finer details of the espresso macchiato.

He also recorded a couple of me summarising my two sessions: here and here.

In summary, great venue, great conference, great sessions, great people.

See you all at SQLBits 9!

 

I have to admit that I’m really excited about presenting a session at SQLBits 8 in Brighton next week. I’ve been an avid supporter of SQLBits since the first conference that I attended (SQLBits 2), and am thoroughly looking forward to finally getting a chance to be a part of the event and presenting my own session. If you’re going, I hope to see you there!

 

My session is about using SSRS, SQL spatial data and DMVs to visualise SSAS OLAP cube structures and generate real-time automated cube documentation (blog post here if you want to know more…).

 

This shows an unusual use for spatial data, drawing diagrams instead of the usual demonstrations which are pretty much always displaying sales by region on a map etc. Whilst writing my demos, it got me thinking – why not use spatial data to draw even more complex pictures, diagrams or logos…

 

So, I set to work trying to write a query to draw the SQLBits logo…

 

The first step is to define the coordinates of the image edges. For this I found a great website designed to help you create HTML image maps (www.image-maps.com). You can upload the image then just click on every corner. It turns this into an HTML image map, which without too much work can be converted into a SQL spatial query.

 

I’ve made it so that each object (or letter) is one query so all 8 queries (s, q, l, b, i, t, s, and the database in the middle) are union’d together to create the entire image.

 

Simple letters (s, l, t & s) are a single polygon, so we can use

SELECT geometry::STPolyFromText(‘POLYGON ((x y, x y, x y))’,0) AS Drawing

Where each xy pairing is a point around the image, and the last point must be the same as the first, to create a closed polygon.

 

Complex letters such as q however need a multi polygon. These allow us to create one polygon for the outline, and then another to remove the hole in the middle. i.e.

SELECT geometry::STMPolyFromText(‘MULTIPOLYGON (((x y, x y, x y)),((x y, x y, x y)))’,0) AS Drawing

With each coordinate group following the same rules as above.

 

We end up with this

SELECT geometry::STPolyFromText(‘POLYGON ((104 -222, 173 -222, 174 -174, 171 -160, 163 -147, 150 -137, 136 -128, 123 -123, 110 -117, 82 -116, 61 -122, 41 -134, 17 -150, 6 -173, 1 -194, 0 -232, 9 -259, 21 -276, 32 -289, 52 -302, 69 -312, 88 -320, 105 -335, 110 -375, 102 -390, 84 -395, 75 -385, 76 -330, 5 -333, 7 -390, 11 -411, 25 -428, 42 -442, 67 -451, 105 -453, 126 -446, 144 -439, 162 -424, 173 -404, 180 -382, 182 -337, 178 -311, 167 -296, 153 -279, 138 -268, 89 -234, 75 -222, 71 -208, 73 -188, 88 -178, 100 -190, 105 -220, 104 -222))’,0) AS Drawing

UNION ALL

SELECT geometry::STMPolyFromText(‘MULTIPOLYGON (((324 -127, 404 -127, 405 -488, 322 -490, 322 -421, 311 -432, 291 -446, 277 -452, 259 -453, 248 -446, 239 -440, 228 -429, 221 -419, 215 -402, 215 -386, 213 -188, 216 -174, 219 -159, 226 -148, 235 -140, 245 -132, 261 -127, 278 -127, 294 -134, 306 -143, 322 -158, 324 -127)),((296 -191, 300 -186, 308 -182, 319 -188, 324 -196, 322 -384, 317 -391, 311 -395, 305 -395, 300 -395, 293 -388, 296 -191)))’,0) AS Drawing

UNION ALL

SELECT geometry::STPolyFromText(‘POLYGON ((447 -62, 532 -65, 532 -450, 447 -450, 447 -62))’,0) AS Drawing

UNION ALL

SELECT geometry::STMPolyFromText(‘MULTIPOLYGON (((991 -170, 1053 -146, 1055 -209, 1065 -201, 1072 -190, 1089 -183, 1108 -181, 1122 -191, 1134 -199, 1139 -217, 1140 -386, 1133 -399, 1129 -408, 1116 -418, 1104 -422, 1090 -419, 1078 -413, 1073 -405, 1066 -397, 1055 -386, 1054 -405, 991 -381, 991 -170)),((1053 -233, 1057 -226, 1067 -224, 1078 -235, 1078 -366, 1074 -373, 1063 -375, 1054 -367, 1053 -233)))’,0) AS Drawing

UNION ALL

SELECT geometry::STMPolyFromText(‘MULTIPOLYGON (((1159 -199, 1226 -198, 1227 -431, 1160 -428, 1159 -199)),((1161 -121, 1227 -111, 1228 -162, 1162 -169, 1161 -121)))’,0) AS Drawing

UNION ALL

SELECT geometry::STPolyFromText(‘POLYGON ((1260 -132, 1322 -133, 1324 -183, 1348 -184, 1350 -227, 1323 -227, 1323 -378, 1354 -377, 1354 -421, 1297 -433, 1283 -432, 1274 -426, 1267 -420, 1260 -407, 1261 -224, 1243 -225, 1241 -179, 1260 -181, 1260 -132))’,0) AS Drawing

UNION ALL

SELECT geometry::STPolyFromText(‘POLYGON ((1445 -259, 1447 -233, 1445 -228, 1438 -224, 1427 -225, 1424 -236, 1426 -252, 1435 -266, 1451 -275, 1465 -286, 1479 -294, 1491 -307, 1499 -319, 1498 -341, 1493 -354, 1485 -369, 1476 -382, 1459 -393, 1440 -401, 1421 -404, 1404 -404, 1393 -398, 1379 -386, 1376 -370, 1373 -364, 1373 -334, 1423 -330, 1424 -359, 1432 -366, 1440 -364, 1448 -358, 1449 -340, 1447 -328, 1440 -319, 1426 -314, 1416 -307, 1406 -300, 1393 -294, 1385 -283, 1379 -270, 1376 -258, 1371 -245, 1371 -232, 1375 -219, 1382 -204, 1390 -189, 1405 -182, 1428 -182, 1442 -192, 1458 -201, 1473 -214, 1489 -231, 1494 -260, 1445 -259))’,0) AS Drawing

UNION ALL

SELECT geometry::STMPolyFromText(‘MULTIPOLYGON (((579 -40, 589 -29, 602 -22, 621 -15, 639 -13, 656 -9, 676 -7, 698 -4, 722 -2, 749 -1, 853 -0, 886 -4, 915 -7, 937 -12, 967 -16, 984 -25, 1000 -32, 1006 -59, 999 -61, 986 -65, 976 -75, 970 -88, 968 -102, 971 -121, 956 -127, 945 -135, 931 -149, 921 -166, 921 -183, 928 -199, 939 -209, 945 -216, 937 -224, 927 -234, 918 -246, 915 -260, 915 -278, 923 -293, 928 -308, 944 -317, 936 -328, 927 -341, 924 -354, 923 -374, 933 -389, 943 -400, 957 -404, 968 -407, 967 -420, 967 -437, 976 -449, 988 -459, 1008 -467, 1000 -476, 991 -483, 971 -492, 957 -494, 943 -500, 926 -503, 906 -507, 888 -507, 709 -508, 692 -506, 674 -505, 656 -501, 642 -498, 624 -496, 606 -491, 591 -485, 577 -473, 579 -40)), ‘

+ ‘((579 -136, 591 -144, 606 -150, 623 -154, 641 -159, 664 -163, 684 -165, 702 -169, 732 -170, 758 -171, 845 -173, 873 -170, 925 -162, 922 -172, 901 -177, 862 -183, 818 -186, 759 -185, 714 -183, 681 -182, 647 -174, 613 -168, 588 -161, 580 -151, 579 -136)),’

+ ‘((578 -246, 593 -257, 613 -265, 636 -271, 664 -276, 694 -277, 724 -281, 789 -283, 833 -283, 873 -281, 916 -273, 919 -285, 884 -293, 840 -295, 809 -299, 768 -299, 731 -298, 703 -295, 672 -293, 647 -289, 624 -281, 605 -276, 593 -271, 580 -262, 579 -262, 578 -246)),’

+ ‘((578 -360, 593 -369, 615 -377, 635 -382, 664 -388, 689 -390, 716 -394, 751 -395, 857 -394, 881 -391, 905 -389, 932 -383, 939 -392, 917 -399, 880 -405, 839 -409, 786 -411, 739 -411, 701 -409, 667 -405, 635 -399, 611 -392, 591 -383, 580 -377, 578 -360)))’,0) AS Drawing

 

Which, when we run it in SQL 2008 Management Studio, returns the results as

When you run a query which includes a spatial data type as a column, SSMS gives us a new tab, ‘Spatial results’. Clicking on this gives us a visual representation of the spatial results.

 

Note that I’ve had some trouble viewing multi polygons in SQL 2008 Management Studio, and can only get them to work in R2. Basic polygons seem to be fine in SQL 2008 though.

 

We can put this directly in a SQL Server Reporting Services map component (set to planar coordinates) and see it in a report.

 

Frog-Blog out

 

SQL Server User Group MidlandsIt’s only a week to go until the first Midlands SQL Server User Group, being held on March 10th 2011 at the Old Joint Stock pub in Birmingham.

We’ve got two of the best speakers in the UK lined up, Allan Mitchell and Neil Hambly, we’re putting food on for you (pork pies and chip butties!) and it’s being held in a pub so beer will also be involved.

SQL Server, pork pies and beer – all in one place?! Go on, how can you resist?!

Register for FREE here

This isn’t a technical blog post of my own, but a shout out to Rob Farley and an excellent blog post explaining how to use SQL’s OPTION (FAST x) hint. He explains how you can speed up an SSIS data flow by slowing down the source query. It may seem illogical at first, but you’ll understand after you go and read Rob’s post!

Read Rob’s post here: Speeding up SSIS using OPTION (FAST)

One of the great features of using Excel to browse an SSAS OLAP cube is the drillthrough ability. If you double click on any cell of an OLAP pivot table, Excel will create a new worksheet containing the top 1000 fact records that went to make up the figure in the selected cell.

N.B. The limit of 1000 rows can be altered, as per one of my previous blog posts here.


This feature is pretty well known, but not many folk realise how easy it is to reproduce this in SQL Server Management Studio (SSMS). All you need to do is prefix your query with DRILLTHROUGH.

i.e. Assuming an MDX query of

SELECT [Measures].[Internet Sales Amount] ON 0
FROM [Adventure Works]
WHERE [Date].[January 1, 2004]

Which returns the following results…

A query of

DRILLTHROUGH
SELECT [Measures].[Internet Sales Amount] ON 0
FROM [Adventure Works]
WHERE [Date].[January 1, 2004]

Returns the records contributing to the total figure. Great for diagnosing problems with an MDX query.

By default, only the first 10,000 rows are returned, but you can override this using MAXROWS

DRILLTHROUGH MAXROWS 500
SELECT [Measures].[Internet Sales Amount] ON 0
FROM [Adventure Works]
WHERE [Date].[January 1, 2004]

The columns that are returned are those defined in the Actions tab of the Cube Designer in BIDS (The Business Intelligence Development Studio).

If no action is defined, then the fact measures will be returned along with the keys that link to each relevant dimension, which tend not to be that helpful.


You can override the returned columns by using the RETURN clause

DRILLTHROUGH SELECT [Measures].[Internet Sales Amount] ON 0 FROM [Adventure Works] WHERE [Date].[January 1, 2004] RETURN [$Internet Sales Order Details].[Internet Sales Order] ,[$Sales Territory].[Sales Territory Region] ,NAME([$Product].[Product]) ,KEY([$Product].[Product]) ,UNIQUENAME([$Product].[Product]) ,[Internet Sales].[Internet Sales Amount] ,[Internet Sales].[Internet Order Quantity]



Note that there are some restrictions on what you can drill through

  • You can’t drill through an expression/calculation, only a raw measure
  • The MDX query needs to return a single cell (otherwise the cube does not know which one to drill through)
  • The data returned will be at the lowest granularity of the cube’s fact table

To explain the last point further, the cube does not return the raw data from the underlying data warehouse, but a summary of the facts grouped by unique combination of the relevant dimensions. i.e. if a warehouse table containing individual sales (by date, product, customer & store) is brought into a cube as a fact table that only has relationships with the date and product dimensions, then the cube drill through will return unique combinations of date and product, summarising sales for each combination. Extra granularity which the warehouse may contain (customer and store) will not be available.

Note that if you specify the RETURN columns, the rows are still returned at the lowest level of the fact table granularity, even if not all of the dimensions are brought out as columns. This may result in returning multiple identical records. Don’t worry, these will be distinct facts, just differentiated by a dimension/attribute that isn’t being returned.

You can find out more on TechNet here


Frog-Blog Out

One of the most useful aspects of a Business Intelligence system is the ability to add calculations to create new measures. This centralises the logic of the calculation into a single place, ensuring consistency and standardisation across the user base.

By way of example, a simple calculation for profit (Income – Expenditure) wouldn’t be provided by the source database and historically would be implemented in each and every report. In a data warehouse and/or cube we can create the calculation in a single place for everyone to use.

This post highlights some of methods of doing this, each with their respective pros and cons.

Calculated Members in SSAS Cube


SSAS provides a ‘Calculations’ tab in the cube designer which allows you to create new measures using MDX. You can use any combination of existing measures and dimension attributes, along with the plethora of MDX functions available to create highly complex calculations.
Pros:

  • Very complex calculations can be created using all available MDX functions
  • No changes are required to the structure of the data warehouse
  • Changes to the calculation will apply to every record, historic and new
  • The results are not stored in the warehouse or cube, so no extra space is required
  • New calculations can be added without having to deploy or reprocess the cube
  • Calculations can be scoped to any level of aggregation and granularity. Different calculations can even be used for different scopes
  • Calculations can easily combine measures from different measure groups

Cons:

  • The calculation will not make use of SSAS cube aggregations, reducing performance
  • SSAS drill through actions will not work
  • The calculation results are not available in the data warehouse, only the cube

SQL Calculations in the Data Source View


There’s a layer in-between the data warehouse and the cube called the data source view (DSV). This presents the relevant tables in the warehouse to the cube, and can be used to enhance the underlying data with calculations. This can either be the dsv layer within the cube project, or I prefer to create SQL Server views to encapsulate the logic.
Pros:

  • No changes are required to the table structure of the data warehouse
  • Calculations use SQL not MDX, reducing the complexity
  • Changes to the calculation will apply to every record, historic and new
  • The calculation will make full use of SSAS cube aggregations
  • SSAS drill through actions will work
  • The results are not stored in the warehouse, so the size of the database does not increase

Cons:

  • The cube must be redeployed and reprocessed before the new measure is available
  • The results of the calculation must be valid at the granularity of the fact table
  • The calculation results are not available in the data warehouse, only the cube

Calculate in the ETL process


Whilst bringing in data from the source data systems, it sometimes makes sense to perform calculations on the data at that point, and store the results in the warehouse.
Pros:

  • The results of the calculation will be available when querying the warehouse as well as the cube
  • In the ETL pipeline you can import other data sources (using lookups etc.) to utilise other data in the calculation
  • If the calculation uses time based data, or data valid at a specific time (i.e. share price) then by performing the calculation in the ETL, the correct time based data is used, without having to store the full history of the underlying source data
  • The calculation will make full use of SSAS cube aggregations
  • SSAS drill through actions will work

Cons:

  • You have to be able to alter the structure of the data warehouse, which isn’t always an option.
  • The results are stored in the warehouse, increasing the size of the database
  • The results of the calculation must be valid at the granularity of the fact table
  • If the calculation logic changes, all existing records must be updated


In Conclusion

If the calculation is valid for a single record, and it would be of benefit to have access to the results in the warehouse, then perform the calculation in the ETL pipeline and store teh results in the warehouse.

If the calculation is valid for a single record, and it would not be of benefit to have the results in the warehouse, then calculate it in the data source view.

If the calculation is too complex for SQL, requiring MDX functions, then create an MDX calculated measure in the cube.

SQL Server Integration Services (SSIS) packages are used in numerous scenarios for moving data from A to B. Often they are developed and tested against a cutdown, often static, subset of data. One of the problems with this is that yes you’re testing the functionality of the package as it’s being developed, but there’s no way to determine whether the performance will scale up to a full size production environment. This level of testing is more often than not ignored, resulting in packages being deployed to live which just can’t cope with the data volume, bringing down the load process.

We can divide performance checking into two:

  1. Load testing pre deployment
  2. Continual monitoring and projections

It’s vital to undertake performance load testing of packages before they’re deployed, or at least review the source queries and SSIS components and structure to ensure there’s nothing that’s likely to cause an exponentially increasing runtime. There are loads of blog posts about SSIS performance tuning so I won’t go into that here.

What I did want to talk about here was the importance of continual monitoring. A package that runs fine today may grind to a halt in a year’s time if the live data volume continues to increase. How do you check this, and how do you project data growth into the future to predict performance problems that haven’t happened yet?

The first step is to start tracking the time taken to run each package, and store this to a table. As a rule I always build this level of logging into my template packages when I’m defining the SSIS ETL framework. Again, there are heaps of articles on different ways to do this, check out one of Jamie’s gems as a starting point. The key outcome is that you end up with a start time and end time (and hence a duration) of each package every time it runs. If you don’t have any custom logging, you can always hack together the data from the sysssislog table if you’ve enabled it (and I hope you have..!).

Once you have the raw data available, leave the package to run for a month or two and then analyse the results in Excel to perform a simple projection. Just copy the data into Excel in a format similar to this. It doesn’t matter if you have duplicate dates

Date Duration
18/08/2010 17
18/08/2010 16
19/08/2010 17
20/08/2010 18
21/08/2010 17

And then create a scatter chart

Format the X axis and make sure it’s set to be a date. You should end up with a chart similar to this.

Add a trend line to the chart by right clicking on one of the data points and click ‘add trendline’. Hopefully the trendline will be linear so choose that. If your data looks exponential then you really need to re-assess your package urgently!

There’s a nifty feature of Excel trendlines that allows you to forecast the trendline forward by x periods. If you set this to 365 it will project the package duration forward for a year. The reliability of this trendline will increase as the volume of sample data increases. i.e. if you run your packages for 3 months, you’ll be able to make better predictions than if you only run them for 2 weeks.

This clearly shows that although the package is currently taking 24 minutes to run, with the current data growth it will be taking approximately an hour in a year’s time.

When you do this for each package, you can quickly build up a picture of when you’re likely to run into trouble, and use this as justification for development resource to prevent the problems before they happen.

Frog-Blog Out

If you’re a Business Intelligence developer I assume you have BIDS Helper installed. If not then stop reading this post and go and install it. Now. It adds a number of very useful features to the Business Intelligence Development Studio which provide help with many aspects of SSIS, SSRS and SSAS development.

One of my favourite utilities is the Deploy MDX Script function. This takes the calculation script for an SSAS cube (named sets, calculated measures, scope logic, etc.) and deploys it in isolation without having to redeploy and rebuild the entire cube. This is a life saver when trying to write and test complex MDX calculations, and has saved me days if not weeks of waiting around.

The Deploy MDX Script button works perfectly when deploying updated script to the development environment, but what if you want to deploy the same script changes to a testing or live environment? Is there a way of scripting the change without redeploying the entire cube?

Yes there is, by using the following xmla script. Just change the DatabaseID and CubeID elements of the Object element to point to your Analysis Services database and cube, and paste your MDX calculation script in between the <Text> and </Text> tags. Run the script in SQL Server Management Studio and it should update the cube with the new script.

This script works for SQL Server 2008 and SQL Server 2008 R2.

<Alter AllowCreate="true" ObjectExpansion="ExpandFull" xmlns="http://schemas.microsoft.com/analysisservices/2003/engine" xmlns:as="http://schemas.microsoft.com/analysisservices/2003/engine"> <Object> <DatabaseID>YourDatabaseName</DatabaseID> <CubeID>YourCubeName</CubeID> <MdxScriptID>MdxScript</MdxScriptID> </Object> <ObjectDefinition> <MdxScript> <ID>MdxScript</ID> <Name>MdxScript</Name> <Commands xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ddl2="http://schemas.microsoft.com/analysisservices/2003/engine/2" xmlns:ddl2_2="http://schemas.microsoft.com/analysisservices/2003/engine/2/2" xmlns:ddl100_100="http://schemas.microsoft.com/analysisservices/2008/engine/100/100" xmlns:dwd="http://schemas.microsoft.com/DataWarehouse/Designer/1.0"> <Command> <Text> /* The CALCULATE command controls the aggregation of leaf cells in the cube. If the CALCULATE command is deleted or modified, the data within the cube is affected. You should edit this command only if you manually specify how the cube is aggregated. */ CALCULATE; ------------------------------------------------ --Paste your MDX Calculations here ------------------------------------------------ </Text> </Command> </Commands> </MdxScript> </ObjectDefinition> </Alter>

And there you have it, you can update your MDX calculated members outside of BIDS without doing a full deploy.

Frog-Blog Out

The Frog Blog

I'm Alex Whittles.

I specialise in designing and implementing SQL Server business intelligence solutions, and this is my blog! Just a collection of thoughts, techniques and ramblings on SQL Server, Cubes, Data Warehouses, MDX, DAX and whatever else comes to mind.

Submit a session for SQLBits

Frog Blog Out
twitter
rssicon