Monthly Archives: October 2010
SQL Server Integration Services (SSIS) packages are used in numerous scenarios for moving data from A to B. Often they are developed and tested against a cutdown, often static, subset of data. One of the problems with this is that yes you’re testing the functionality of the package as it’s being developed, but there’s no way to determine whether the performance will scale up to a full size production environment. This level of testing is more often than not ignored, resulting in packages being deployed to live which just can’t cope with the data volume, bringing down the load process.
We can divide performance checking into two:
- Load testing pre deployment
- Continual monitoring and projections
It’s vital to undertake performance load testing of packages before they’re deployed, or at least review the source queries and SSIS components and structure to ensure there’s nothing that’s likely to cause an exponentially increasing runtime. There are loads of blog posts about SSIS performance tuning so I won’t go into that here.
What I did want to talk about here was the importance of continual monitoring. A package that runs fine today may grind to a halt in a year’s time if the live data volume continues to increase. How do you check this, and how do you project data growth into the future to predict performance problems that haven’t happened yet?
The first step is to start tracking the time taken to run each package, and store this to a table. As a rule I always build this level of logging into my template packages when I’m defining the SSIS ETL framework. Again, there are heaps of articles on different ways to do this, check out one of Jamie’s gems as a starting point. The key outcome is that you end up with a start time and end time (and hence a duration) of each package every time it runs. If you don’t have any custom logging, you can always hack together the data from the sysssislog table if you’ve enabled it (and I hope you have..!).
Once you have the raw data available, leave the package to run for a month or two and then analyse the results in Excel to perform a simple projection. Just copy the data into Excel in a format similar to this. It doesn’t matter if you have duplicate dates
And then create a scatter chart
Format the X axis and make sure it’s set to be a date. You should end up with a chart similar to this.
Add a trend line to the chart by right clicking on one of the data points and click ‘add trendline’. Hopefully the trendline will be linear so choose that. If your data looks exponential then you really need to re-assess your package urgently!
There’s a nifty feature of Excel trendlines that allows you to forecast the trendline forward by x periods. If you set this to 365 it will project the package duration forward for a year. The reliability of this trendline will increase as the volume of sample data increases. i.e. if you run your packages for 3 months, you’ll be able to make better predictions than if you only run them for 2 weeks.
This clearly shows that although the package is currently taking 24 minutes to run, with the current data growth it will be taking approximately an hour in a year’s time.
When you do this for each package, you can quickly build up a picture of when you’re likely to run into trouble, and use this as justification for development resource to prevent the problems before they happen.
If you’re a Business Intelligence developer I assume you have BIDS Helper installed. If not then stop reading this post and go and install it. Now. It adds a number of very useful features to the Business Intelligence Development Studio which provide help with many aspects of SSIS, SSRS and SSAS development.
One of my favourite utilities is the Deploy MDX Script function. This takes the calculation script for an SSAS cube (named sets, calculated measures, scope logic, etc.) and deploys it in isolation without having to redeploy and rebuild the entire cube. This is a life saver when trying to write and test complex MDX calculations, and has saved me days if not weeks of waiting around.
The Deploy MDX Script button works perfectly when deploying updated script to the development environment, but what if you want to deploy the same script changes to a testing or live environment? Is there a way of scripting the change without redeploying the entire cube?
Yes there is, by using the following xmla script. Just change the DatabaseID and CubeID elements of the Object element to point to your Analysis Services database and cube, and paste your MDX calculation script in between the <Text> and </Text> tags. Run the script in SQL Server Management Studio and it should update the cube with the new script.
This script works for SQL Server 2008 and SQL Server 2008 R2.
<Alter AllowCreate="true" ObjectExpansion="ExpandFull" xmlns="http://schemas.microsoft.com/analysisservices/2003/engine" xmlns:as="http://schemas.microsoft.com/analysisservices/2003/engine"> <Object> <DatabaseID>YourDatabaseName</DatabaseID> <CubeID>YourCubeName</CubeID> <MdxScriptID>MdxScript</MdxScriptID> </Object> <ObjectDefinition> <MdxScript> <ID>MdxScript</ID> <Name>MdxScript</Name> <Commands xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ddl2="http://schemas.microsoft.com/analysisservices/2003/engine/2" xmlns:ddl2_2="http://schemas.microsoft.com/analysisservices/2003/engine/2/2" xmlns:ddl100_100="http://schemas.microsoft.com/analysisservices/2008/engine/100/100" xmlns:dwd="http://schemas.microsoft.com/DataWarehouse/Designer/1.0"> <Command> <Text> /* The CALCULATE command controls the aggregation of leaf cells in the cube. If the CALCULATE command is deleted or modified, the data within the cube is affected. You should edit this command only if you manually specify how the cube is aggregated. */ CALCULATE; ------------------------------------------------ --Paste your MDX Calculations here ------------------------------------------------ </Text> </Command> </Commands> </MdxScript> </ObjectDefinition> </Alter>
And there you have it, you can update your MDX calculated members outside of BIDS without doing a full deploy.
The Cloud is pretty much top of the list of the corporate buzzword bingo phrases at the moment. It’s clear what the benefits are from a SAAS perspective, but it’s been less clear how the Business Intelligence world will start to make use of it.
Data Warehousing, Management Information, Cubes, Data Mining etc. all involve a lot of data, usually gigabytes or terrabytes even for medium sized organisations. Internet connectivity (speed and reliability) is obviously going to be a major concern and limiting factor however we can assume that connectivity will continue to improve dramatically over the next few years.
Security is a major concern; when you combine all of your company’s information into a single place (accounts, sales, customers, etc.) it raises a large question of how the security is managed when you let this information out of your firewalled LAN and host it on the internet.
It will of course bring a number of benefits incuding reduced infrastructure/capital costs, improved reliability and redundancy of hardware, increased accessibility for the mobile workforce etc. etc., and these should not be overlooked as they will no doubt provide a large incentive for some companies.
I’ve no doubt that the cloud is coming to BI, but I’d think carefully about what actual benefits it brings and how they trade off against any problems it may also bring.
There’s an interesting webcast panel discussion about BI and Cloud Computing below, including (among others) Donald Farmer from Microsoft discussing some of the issues above.
Sourced From: BizIntelligence.tv on blogs.msdn.com
The cloud is coming…