0845 643 64 63

Monthly Archives: December 2013

SSAS Tabular – NUMA and CPU Cores Performance

[UPDATE] After further investigation, I found that the tests in this post were inacurate and the results unreliable. Updated NUMA test results here

In my last post (SSAS Tabular Performance – DefaultSegmentRowCount) I presented some analysis of the query performance impact of changing the DefaultSegmentRowCount setting. This post describes the next tests that I ran on the same system, investigating the impact of restricting SSAS to just 1 NUMA node instead of the 2 avaiable on the server.

It’s well known that SSAS Tabular is not NUMA aware, so it’s common to see advice recommending affiliating SSAS to a single NUMA node to improve performance.

From what I’d read, I was expecting that by affiliating SSAS to a single NUMA node that the query performance would improve slightly, maybe 10-30%.

Recap of the setup:

  • 7.6Gb SSAS tabular cube, running on a 2 x CPU 32 core (Xeon E5-2650 2Ghz, 2 x NUMA nodes) server with 144Gb RAM
  • SQL Server 2012 SP1 CU7 Enterprise
  • 167m rows of data in primary fact
  • 80m distinct CustomerKey values in primary fact
  • No cube partitioning
  • DefaultSegmentRowCount: 2097152
  • ProcessingTimeboxSecPerMRow: 0
  • CPU core affinity configured using Windows System Resource Manager (see John Sirman’s great guide to using WSRM with SSAS)

I ran profiler, checking the ‘Query End’ duration on a simple distinct count of CustomerKey, with no other filters or attributes involved.

TabularQueryTimeByCores

You can see that dropping from 32 cores across 2 NUMA nodes down to 16 cores on a single node had almost no impact at all.

Within a single NUMA node, the performance dramatically improved as the number of cores increased, but as soon as a second NUMA node is added, the performance flat lines, with no further significant improvement no matter how many cores are added.

As per my last post – I’m sure there are other things afoot with this server, so this behaviour may not be representative of other setups, however it again reinforces advice you will have already seen elsewhere, that with SSAS Tabular – avoid NUMA hardware…

Frog-Blog out

SSAS Tabular performance – DefaultSegmentRowCount

I’m currently investigating a poorly performing Tabular model, and came across some interesting test results which seem to contradict the advice in Microsoft’s Performance Tuning of Tabular Models white paper.

Some background:

  • 7.6Gb SSAS tabular cube, running on a 2 x CPU 32 core (Xeon E5-2650 2Ghz, 2 x NUMA nodes) server with 144Gb RAM
  • SQL Server 2012 SP1 CU7 Enterprise
  • 167m rows of data in primary fact
  • 80m distinct CustomerKey values in primary fact
  • No cube partitioning

A simple distinct count in DAX of the CustomerKey, with no filtering, is taking 42 seconds on a cold cache. Far too slow for a tabular model. Hence the investigation.

p88 of the Performance Tuning of Tabular Models white paper discusses the DefaultSegmentRowCount, explaining that it defaults to 8m, and that there should be a correlation between the number of cores and the number of segments. [The number of segments calculated as the number of rows divided by the segment size].

It also indicates that a higher segment size may increase compression, and consequently query performance.

Calculating the number of segments for our data set, gives us the following options:

Rows 167,000,000
Segment Size # Segments
1048576 169
2097152 80
4194304 40
[default] 8388608 20
16777216 10
33554432 5
67108864 3

So, with 32 cores to play with, we should be looking at the default segment size (8m) or maybe reduce it to 4m to get 40 segments. But the extra compression with 16m segment size may be of benefit. So I ran some timing tests on the distinct count measure, and the results are quite interesting.

DefaultSegmentRowSize

It clearly shows that in this environment, reducing the DefaultSegmentRowSize property down to 2m improved the query performance (on a cold cache) from 42s down to 27s – 36% improvement. As well as this, processing time was reduced, as was compression.

This setting creates 80 segments, 2.5 times the number of cores available, but achieved the best performance. Note that the server’s ProcessingTimeboxSecPerMRow setting has been set to 0 to allow for maximum compression.

There’s more to this systems’s performance problems than just this, NUMA for a start, but thought I’d throw this out there in case anyone else is blindly following the performance tuning white paper without doing your own experimentation.

Each environment, data set and server spec is different, so if you need to eek out the last ounce of performance, run your own tests on the SSAS settings and see for yourself.

Frog-Blog Out

[Update: Follow up post exploring the performance impact of NUMA on this server]

The Frog Blog

I'm Alex Whittles.

I specialise in designing and implementing SQL Server business intelligence solutions, and this is my blog! Just a collection of thoughts, techniques and ramblings on SQL Server, Cubes, Data Warehouses, MDX, DAX and whatever else comes to mind.

Data Platform MVP

Frog Blog Out
twitter
rssicon