Posts Tagged ‘Performance’

How does the SSAS 2012 Tabular model performance change when you add more CPU sockets / NUMA nodes?

In my last post (SSAS Tabular NUMA and CPU Cores Performance) I presented the results of some testing I’d been doing on the scalability of the SSAS 2012 Tabular model. Specifically with the performance of distinct count measures over large data volumes (50-200m distinct customers).

The conclusion was that moving from 1 NUMA node (CPU socket) to 2 had no impact on query performance, so the 2nd CPU is entirely wasted. This actually contradicted other advice and recommendations that indicated that adding a second node would actually make the performance worse.

After discussing the issue with a member of the SSAS development team, they advised that the method I was using to disable cores was flawed, and that we shouldn’t be using Windows System Resource Manager. So I re-ran the tests disabling cores (and associated memory) using MSConfig, simulating a physical core removal from the server.

The test results were immediately different…

TabularNUMACoresTestThe hardware setup was the same as before, but with a larger data set:

  • 30Gb SSAS tabular cube, running on a 2 x CPU 32 core (Xeon E5-2650 2Ghz, 2 x NUMA nodes, hyperthreaded) server with 144Gb RAM
  • SQL Server 2012 SP1 CU8 Enterprise (+ a further hotfix that resolves a problem with distinct counts >2m)
  • 900m rows of data in primary fact
  • 200m distinct CustomerKey values in primary fact
  • No cube partitioning
  • DefaultSegmentRowCount: 2097152
  • ProcessingTimeboxSecPerMRow: 0
  • CPU cores and associated memory disabled using MSConfig

The two test queries were

  • Query 1: Simple, single value result of the total distinct customer count
  • Query 2: More complex distinct count query, sliced by a number of different attributes to give approx 600 result cells

As soon as the cores are increased above 16 (i.e. the 2nd CPU is introduced), the queries take 1.45x and 2x the time to run. Query performance drops significantly. The simple query takes almost exactly double the time.

These results now support other theories floating around the blogosphere, that adding extra CPUs not only doesn’t help the tabular performance, it actually significantly hinders it.

As before, the default segment count setting gave the best performance at 2m and 4m. Raising it seemed to degrade performance.

Frog Blog out

© Alex Whittles, Purple Frog Systems Ltd

[UPDATE] After further investigation, I found that the tests in this post were inacurate and the results unreliable. Updated NUMA test results here

In my last post (SSAS Tabular Performance – DefaultSegmentRowCount) I presented some analysis of the query performance impact of changing the DefaultSegmentRowCount setting. This post describes the next tests that I ran on the same system, investigating the impact of restricting SSAS to just 1 NUMA node instead of the 2 avaiable on the server.

It’s well known that SSAS Tabular is not NUMA aware, so it’s common to see advice recommending affiliating SSAS to a single NUMA node to improve performance.

From what I’d read, I was expecting that by affiliating SSAS to a single NUMA node that the query performance would improve slightly, maybe 10-30%.

Recap of the setup:

  • 7.6Gb SSAS tabular cube, running on a 2 x CPU 32 core (Xeon E5-2650 2Ghz, 2 x NUMA nodes) server with 144Gb RAM
  • SQL Server 2012 SP1 CU7 Enterprise
  • 167m rows of data in primary fact
  • 80m distinct CustomerKey values in primary fact
  • No cube partitioning
  • DefaultSegmentRowCount: 2097152
  • ProcessingTimeboxSecPerMRow: 0
  • CPU core affinity configured using Windows System Resource Manager (see John Sirman’s great guide to using WSRM with SSAS)

I ran profiler, checking the ‘Query End’ duration on a simple distinct count of CustomerKey, with no other filters or attributes involved.

TabularQueryTimeByCores

You can see that dropping from 32 cores across 2 NUMA nodes down to 16 cores on a single node had almost no impact at all.

Within a single NUMA node, the performance dramatically improved as the number of cores increased, but as soon as a second NUMA node is added, the performance flat lines, with no further significant improvement no matter how many cores are added.

As per my last post – I’m sure there are other things afoot with this server, so this behaviour may not be representative of other setups, however it again reinforces advice you will have already seen elsewhere, that with SSAS Tabular – avoid NUMA hardware…

Frog-Blog out

© Alex Whittles, Purple Frog Systems Ltd

I’m currently investigating a poorly performing Tabular model, and came across some interesting test results which seem to contradict the advice in Microsoft’s Performance Tuning of Tabular Models white paper.

Some background:

  • 7.6Gb SSAS tabular cube, running on a 2 x CPU 32 core (Xeon E5-2650 2Ghz, 2 x NUMA nodes) server with 144Gb RAM
  • SQL Server 2012 SP1 CU7 Enterprise
  • 167m rows of data in primary fact
  • 80m distinct CustomerKey values in primary fact
  • No cube partitioning

A simple distinct count in DAX of the CustomerKey, with no filtering, is taking 42 seconds on a cold cache. Far too slow for a tabular model. Hence the investigation.

p88 of the Performance Tuning of Tabular Models white paper discusses the DefaultSegmentRowCount, explaining that it defaults to 8m, and that there should be a correlation between the number of cores and the number of segments. [The number of segments calculated as the number of rows divided by the segment size].

It also indicates that a higher segment size may increase compression, and consequently query performance.

Calculating the number of segments for our data set, gives us the following options:

Rows 167,000,000
Segment Size # Segments
1048576 169
2097152 80
4194304 40
[default] 8388608 20
16777216 10
33554432 5
67108864 3

So, with 32 cores to play with, we should be looking at the default segment size (8m) or maybe reduce it to 4m to get 40 segments. But the extra compression with 16m segment size may be of benefit. So I ran some timing tests on the distinct count measure, and the results are quite interesting.

DefaultSegmentRowSize

It clearly shows that in this environment, reducing the DefaultSegmentRowSize property down to 2m improved the query performance (on a cold cache) from 42s down to 27s – 36% improvement. As well as this, processing time was reduced, as was compression.

This setting creates 80 segments, 2.5 times the number of cores available, but achieved the best performance. Note that the server’s ProcessingTimeboxSecPerMRow setting has been set to 0 to allow for maximum compression.

There’s more to this systems’s performance problems than just this, NUMA for a start, but thought I’d throw this out there in case anyone else is blindly following the performance tuning white paper without doing your own experimentation.

Each environment, data set and server spec is different, so if you need to eek out the last ounce of performance, run your own tests on the SSAS settings and see for yourself.

Frog-Blog Out

[Update: Follow up post exploring the performance impact of NUMA on this server]

© Alex Whittles, Purple Frog Systems Ltd

This isn’t a technical blog post of my own, but a shout out to Rob Farley and an excellent blog post explaining how to use SQL’s OPTION (FAST x) hint. He explains how you can speed up an SSIS data flow by slowing down the source query. It may seem illogical at first, but you’ll understand after you go and read Rob’s post!

Read Rob’s post here: Speeding up SSIS using OPTION (FAST)

© Alex Whittles, Purple Frog Systems Ltd

SQL Server Integration Services (SSIS) packages are used in numerous scenarios for moving data from A to B. Often they are developed and tested against a cutdown, often static, subset of data. One of the problems with this is that yes you’re testing the functionality of the package as it’s being developed, but there’s no way to determine whether the performance will scale up to a full size production environment. This level of testing is more often than not ignored, resulting in packages being deployed to live which just can’t cope with the data volume, bringing down the load process.

We can divide performance checking into two:

  1. Load testing pre deployment
  2. Continual monitoring and projections

It’s vital to undertake performance load testing of packages before they’re deployed, or at least review the source queries and SSIS components and structure to ensure there’s nothing that’s likely to cause an exponentially increasing runtime. There are loads of blog posts about SSIS performance tuning so I won’t go into that here.

What I did want to talk about here was the importance of continual monitoring. A package that runs fine today may grind to a halt in a year’s time if the live data volume continues to increase. How do you check this, and how do you project data growth into the future to predict performance problems that haven’t happened yet?

The first step is to start tracking the time taken to run each package, and store this to a table. As a rule I always build this level of logging into my template packages when I’m defining the SSIS ETL framework. Again, there are heaps of articles on different ways to do this, check out one of Jamie’s gems as a starting point. The key outcome is that you end up with a start time and end time (and hence a duration) of each package every time it runs. If you don’t have any custom logging, you can always hack together the data from the sysssislog table if you’ve enabled it (and I hope you have..!).

Once you have the raw data available, leave the package to run for a month or two and then analyse the results in Excel to perform a simple projection. Just copy the data into Excel in a format similar to this. It doesn’t matter if you have duplicate dates

Date Duration
18/08/2010 17
18/08/2010 16
19/08/2010 17
20/08/2010 18
21/08/2010 17

And then create a scatter chart

Format the X axis and make sure it’s set to be a date. You should end up with a chart similar to this.

Add a trend line to the chart by right clicking on one of the data points and click ‘add trendline’. Hopefully the trendline will be linear so choose that. If your data looks exponential then you really need to re-assess your package urgently!

There’s a nifty feature of Excel trendlines that allows you to forecast the trendline forward by x periods. If you set this to 365 it will project the package duration forward for a year. The reliability of this trendline will increase as the volume of sample data increases. i.e. if you run your packages for 3 months, you’ll be able to make better predictions than if you only run them for 2 weeks.

This clearly shows that although the package is currently taking 24 minutes to run, with the current data growth it will be taking approximately an hour in a year’s time.

When you do this for each package, you can quickly build up a picture of when you’re likely to run into trouble, and use this as justification for development resource to prevent the problems before they happen.

Frog-Blog Out

© Alex Whittles, Purple Frog Systems Ltd

Today’s Frog-Blog top-tips are quite simple ones, but ones that I always find very useful. SSRS Report Performance monitoring.

Once you start to build up a few Reporting Services reports, you need to find out how they’re performing, whether any particular reports are causing the server problems and even which users are running which reports. The following set of queries should point you in the right direction. They should all be run against the ReportServer database.

The 20 most recently run reports

 SELECT TOP 20 C.Path, C.Name, EL.UserName, EL.Status, EL.TimeStart, 
     EL.[RowCount], EL.ByteCount, 
    (EL.TimeDataRetrieval + EL.TimeProcessing + EL.TimeRendering)/1000 AS TotalSeconds, 
     EL.TimeDataRetrieval, EL.TimeProcessing, EL.TimeRendering 
   FROM ExecutionLog EL 
    INNER JOIN Catalog C ON EL.ReportID = C.ItemID 
   ORDER BY TimeStart DESC 

The slowest reports (in the last 28 days)

 SELECT TOP 10 C.Path, C.Name, Count(*) AS ReportsRun, 
   AVG((EL.TimeDataRetrieval + EL.TimeProcessing + EL.TimeRendering)) 
                                            AS AverageProcessingTime, 
   Max((EL.TimeDataRetrieval + EL.TimeProcessing + EL.TimeRendering)) 
                                            AS MaximumProcessingTime, 
   Min((EL.TimeDataRetrieval + EL.TimeProcessing + EL.TimeRendering)) 
                                            AS MinimumProcessingTime 
 FROM ExecutionLog EL 
   INNER JOIN Catalog C ON EL.ReportID = C.ItemID 
 WHERE EL.TimeStart>Datediff(d,GetDate(),-28) 
 GROUP BY C.Path,C.Name 
 ORDER BY AVG((EL.TimeDataRetrieval + EL.TimeProcessing + EL.TimeRendering)) DESC 

The most active users

 SELECT TOP 10 EL.UserName, Count(*) AS ReportsRun, 
   Count(DISTINCT C.[Path]) AS DistinctReportsRun 
  FROM ExecutionLog EL 
    INNER JOIN Catalog C ON EL.ReportID = C.ItemID 
  WHERE EL.TimeStart>Datediff(d,GetDate(),-28) 
  GROUP BY EL.UserName 
  ORDER BY Count(*) DESC 

The most popular reports

 SELECT TOP 10 C.Path, C.Name, Count(*) AS ReportsRun, 
   AVG((EL.TimeDataRetrieval + EL.TimeProcessing + EL.TimeRendering)) 
                                            AS AverageProcessingTime, 
   Max((EL.TimeDataRetrieval + EL.TimeProcessing + EL.TimeRendering)) 
                                            AS MaximumProcessingTime, 
   Min((EL.TimeDataRetrieval + EL.TimeProcessing + EL.TimeRendering)) 
                                            AS MinimumProcessingTime 
  FROM ExecutionLog EL 
   INNER JOIN Catalog C ON EL.ReportID = C.ItemID 
  WHERE EL.TimeStart>Datediff(d,GetDate(),-28) 
  GROUP BY C.Path, C.Name 
  ORDER BY Count(*) DESC 

Failed Reports

 SELECT TOP 20 C.Path, C.Name, EL.UserName, EL.Status, EL.TimeStart, 
   EL.[RowCount], EL.ByteCount, 
   (EL.TimeDataRetrieval + EL.TimeProcessing + EL.TimeRendering)/1000 AS TotalSeconds, 
   EL.TimeDataRetrieval, EL.TimeProcessing, EL.TimeRendering 
  FROM ExecutionLog EL 
   INNER JOIN Catalog C ON EL.ReportID = C.ItemID 
  WHERE EL.Status <> 'rsSuccess' 
  ORDER BY TimeStart DESC 

There are countless other variations, but this should be enough to get you going.

[Update: 05/09/2011 – Thanks to Jonathan [Twitter|Website] for pointing out a typo!]

© Alex Whittles, Purple Frog Systems Ltd
The Frog Blog

I'm Alex Whittles.

I specialise in designing and implementing SQL Server business intelligence solutions, and this is my blog! Just a collection of thoughts, techniques and ramblings on SQL Server, Cubes, Data Warehouses, MDX, DAX and whatever else comes to mind.

I'm Organising SQL Relay 2014

Submit a session for SQLBits

Frog Blog Out
twitter
rssicon