[UPDATE] After further investigation, I found that the tests in this post were inacurate and the results unreliable. Updated NUMA test results here
In my last post (SSAS Tabular Performance – DefaultSegmentRowCount) I presented some analysis of the query performance impact of changing the DefaultSegmentRowCount setting. This post describes the next tests that I ran on the same system, investigating the impact of restricting SSAS to just 1 NUMA node instead of the 2 avaiable on the server.
It’s well known that SSAS Tabular is not NUMA aware, so it’s common to see advice recommending affiliating SSAS to a single NUMA node to improve performance.
From what I’d read, I was expecting that by affiliating SSAS to a single NUMA node that the query performance would improve slightly, maybe 10-30%.
Recap of the setup:
- 7.6Gb SSAS tabular cube, running on a 2 x CPU 32 core (Xeon E5-2650 2Ghz, 2 x NUMA nodes) server with 144Gb RAM
- SQL Server 2012 SP1 CU7 Enterprise
- 167m rows of data in primary fact
- 80m distinct CustomerKey values in primary fact
- No cube partitioning
- DefaultSegmentRowCount: 2097152
- ProcessingTimeboxSecPerMRow: 0
- CPU core affinity configured using Windows System Resource Manager (see John Sirman’s great guide to using WSRM with SSAS)
I ran profiler, checking the ‘Query End’ duration on a simple distinct count of CustomerKey, with no other filters or attributes involved.
You can see that dropping from 32 cores across 2 NUMA nodes down to 16 cores on a single node had almost no impact at all.
Within a single NUMA node, the performance dramatically improved as the number of cores increased, but as soon as a second NUMA node is added, the performance flat lines, with no further significant improvement no matter how many cores are added.
As per my last post – I’m sure there are other things afoot with this server, so this behaviour may not be representative of other setups, however it again reinforces advice you will have already seen elsewhere, that with SSAS Tabular – avoid NUMA hardware…
Frog-Blog out
Very interesting. What SSAS memory limits did you use when affinitized to one NUMA node?
HardMemoryLimit: 50
LowMemoryLimit: 35
TotalMemoryLimit: 50
VertipaqMemoryLimit: 45
I do have another (28Gb) tabular cube on there, so memory is a little tight at the moment when limiting to a single NUMA node – hoping for a bigger server with faster memory soon!
Create one SSAS instance per NUMA node and split databases between instances… but the real issue is what happen whether your database is larger than RAM local to a single NUMA node
Yeah that’s what we’re seeing. The problem is in this environment we only have a single large database, so we can’t split it – and the hardware was not originally specified for use with the Tabular model so it’s far from ideal.
Just goes to show that you need to know what you’ll be using it for before buying a server!
Hi Alex,
looking at this graph and was wondering ;
at some point the CPU power is no longer the bottleneck ? How much data are you scanning in memory ?
Hi Henk.
Both queries were scanning approx 900m records, with 200m distinct customers.
The two queries show different CPU profiles. One CPU limited, one not. The simple ‘single result’ query, which just asked for the total distinct count across the entire table, kept improving as cores were added (within a single NUMA node). The more complex query split the results up into about 600 cells, split using a number of different attributes from different tables. This performance levelled out at 8-10 cores.
My assumption at the moment is that this performance is being constrained by the memory speed (only a frustrating 667Mhz on this server). I’m hopefully getting hold of a 1866Mhz RAM server to test this theory.
Henk – correction, my comments apply to the later tests (https://purplefrogsystems.com/2014/02/ssas-tabular-performance-numa-update/), not these.
This was a smaller dataset, and the CPU performance tests were flawed, as explained on the update post.
We have performance degradation in Physical Servers(512Gb) but not in Virtual Machines.(128gb) to process Tabular Models whose max size is 37Gb database size. While your test are conducted to measure Query Performance, can we say the same is true for Processing performance on physical servers(separate SSAS instance) are degraded because they are not NUMA aware?
Hi Kay
That’s an interesting question. Certainly VMs can help overcome NUMA limitations due to being able to set the NUMA node affiliation for a VM. However they commonly create their own other performance issues, so it’s a balance.
Regarding processing, I can’t say for sure as I haven’t conducted tests, but my expectation would be that yes, processing is also likely to be impacted by NUMA. Simply because all memory operations are impacted when crossing NUMA nodes, this is independent of the specific workload. NUMA aware systems overcome this by aligning the data held in memory and the CPU accessing that data so that they’re on the same node.
This would be specifically problematic for SSAS Tabular 2014 and earlier, as it is only able to process a single partition for each table at once. In Tabular 2016 onwards, partitions can be processed in parallel, so the parallelisation offered by a multi-NUMA environment may outweigh the possible performance degradation of each core.
Regards
Alex
Thanks Alex for your feedback. We will try to push towards testing NUMA Affiliation in our labs to weigh its outcome. Will keep you posted here.
This has been tested on VM/labs. Processing times improved by 60%.
Kay, can you provide some more details about the tests you made? I would really appreciate. Thanks!
Hi and thanks for this post,
unfortunately the mentioned blog post (John Sirman’s great guide to using WSRM with SSAS) isn’t available any more :/…
Somebody knows where this post or a similiar one can be found?
Regards, Marcel
Hi Marcel
Thanks for pointing this out, I’ve updated the link to the new location, and it now works again.
Thanks
Alex