Posts Tagged ‘Performance’
This isn’t a technical blog post of my own, but a shout out to Rob Farley and an excellent blog post explaining how to use SQL’s OPTION (FAST x) hint. He explains how you can speed up an SSIS data flow by slowing down the source query. It may seem illogical at first, but you’ll understand after you go and read Rob’s post!
Read Rob’s post here: Speeding up SSIS using OPTION (FAST)
SQL Server Integration Services (SSIS) packages are used in numerous scenarios for moving data from A to B. Often they are developed and tested against a cutdown, often static, subset of data. One of the problems with this is that yes you’re testing the functionality of the package as it’s being developed, but there’s no way to determine whether the performance will scale up to a full size production environment. This level of testing is more often than not ignored, resulting in packages being deployed to live which just can’t cope with the data volume, bringing down the load process.
We can divide performance checking into two:
- Load testing pre deployment
- Continual monitoring and projections
It’s vital to undertake performance load testing of packages before they’re deployed, or at least review the source queries and SSIS components and structure to ensure there’s nothing that’s likely to cause an exponentially increasing runtime. There are loads of blog posts about SSIS performance tuning so I won’t go into that here.
What I did want to talk about here was the importance of continual monitoring. A package that runs fine today may grind to a halt in a year’s time if the live data volume continues to increase. How do you check this, and how do you project data growth into the future to predict performance problems that haven’t happened yet?
The first step is to start tracking the time taken to run each package, and store this to a table. As a rule I always build this level of logging into my template packages when I’m defining the SSIS ETL framework. Again, there are heaps of articles on different ways to do this, check out one of Jamie’s gems as a starting point. The key outcome is that you end up with a start time and end time (and hence a duration) of each package every time it runs. If you don’t have any custom logging, you can always hack together the data from the sysssislog table if you’ve enabled it (and I hope you have..!).
Once you have the raw data available, leave the package to run for a month or two and then analyse the results in Excel to perform a simple projection. Just copy the data into Excel in a format similar to this. It doesn’t matter if you have duplicate dates
| Date | Duration |
| 18/08/2010 | 17 |
| 18/08/2010 | 16 |
| 19/08/2010 | 17 |
| 20/08/2010 | 18 |
| 21/08/2010 | 17 |
| … | |
And then create a scatter chart

Format the X axis and make sure it’s set to be a date. You should end up with a chart similar to this.

Add a trend line to the chart by right clicking on one of the data points and click ‘add trendline’. Hopefully the trendline will be linear so choose that. If your data looks exponential then you really need to re-assess your package urgently!

There’s a nifty feature of Excel trendlines that allows you to forecast the trendline forward by x periods. If you set this to 365 it will project the package duration forward for a year. The reliability of this trendline will increase as the volume of sample data increases. i.e. if you run your packages for 3 months, you’ll be able to make better predictions than if you only run them for 2 weeks.

This clearly shows that although the package is currently taking 24 minutes to run, with the current data growth it will be taking approximately an hour in a year’s time.
When you do this for each package, you can quickly build up a picture of when you’re likely to run into trouble, and use this as justification for development resource to prevent the problems before they happen.
Frog-Blog Out
Today’s Frog-Blog top-tips are quite simple ones, but ones that I always find very useful. SSRS Report Performance monitoring.
Once you start to build up a few Reporting Services reports, you need to find out how they’re performing, whether any particular reports are causing the server problems and even which users are running which reports. The following set of queries should point you in the right direction. They should all be run against the ReportServer database.
The 20 most recently run reports
SELECT TOP 20 C.Path, C.Name, EL.UserName, EL.Status, EL.TimeStart,
EL.[RowCount], EL.ByteCount,
(EL.TimeDataRetrieval + EL.TimeProcessing + EL.TimeRendering)/1000 AS TotalSeconds,
EL.TimeDataRetrieval, EL.TimeProcessing, EL.TimeRendering
FROM ExecutionLog EL
INNER JOIN Catalog C ON EL.ReportID = C.ItemID
ORDER BY TimeStart DESC
The slowest reports (in the last 28 days)
SELECT TOP 10 C.Path, C.Name, Count(*) AS ReportsRun,
AVG((EL.TimeDataRetrieval + EL.TimeProcessing + EL.TimeRendering))
AS AverageProcessingTime,
Max((EL.TimeDataRetrieval + EL.TimeProcessing + EL.TimeRendering))
AS MaximumProcessingTime,
Min((EL.TimeDataRetrieval + EL.TimeProcessing + EL.TimeRendering))
AS MinimumProcessingTime
FROM ExecutionLog EL
INNER JOIN Catalog C ON EL.ReportID = C.ItemID
WHERE EL.TimeStart>Datediff(d,GetDate(),-28)
GROUP BY C.Path,C.Name
ORDER BY AVG((EL.TimeDataRetrieval + EL.TimeProcessing + EL.TimeRendering)) DESC
The most active users
SELECT TOP 10 EL.UserName, Count(*) AS ReportsRun,
Count(DISTINCT C.[Path]) AS DistinctReportsRun
FROM ExecutionLog EL
INNER JOIN Catalog C ON EL.ReportID = C.ItemID
WHERE EL.TimeStart>Datediff(d,GetDate(),-28)
GROUP BY EL.UserName
ORDER BY Count(*) DESC
The most popular reports
SELECT TOP 10 C.Path, C.Name, Count(*) AS ReportsRun,
AVG((EL.TimeDataRetrieval + EL.TimeProcessing + EL.TimeRendering))
AS AverageProcessingTime,
Max((EL.TimeDataRetrieval + EL.TimeProcessing + EL.TimeRendering))
AS MaximumProcessingTime,
Min((EL.TimeDataRetrieval + EL.TimeProcessing + EL.TimeRendering))
AS MinimumProcessingTime
FROM ExecutionLog EL
INNER JOIN Catalog C ON EL.ReportID = C.ItemID
WHERE EL.TimeStart>Datediff(d,GetDate(),-28)
GROUP BY C.Path, C.Name
ORDER BY Count(*) DESC
Failed Reports
SELECT TOP 20 C.Path, C.Name, EL.UserName, EL.Status, EL.TimeStart,
EL.[RowCount], EL.ByteCount,
(EL.TimeDataRetrieval + EL.TimeProcessing + EL.TimeRendering)/1000 AS TotalSeconds,
EL.TimeDataRetrieval, EL.TimeProcessing, EL.TimeRendering
FROM ExecutionLog EL
INNER JOIN Catalog C ON EL.ReportID = C.ItemID
WHERE EL.Status <> 'rsSuccess'
ORDER BY TimeStart DESC
There are countless other variations, but this should be enough to get you going.
[Update: 05/09/2011 - Thanks to Jonathan [Twitter|Website] for pointing out a typo!]

I specialise in designing and implementing SQL Server business intelligence solutions,
and this is my blog! Just a collection of thoughts, techniques and ramblings on SQL Server, Cubes, Data Warehouses, MDX, DAX and whatever else comes to mind.
