How does the SSAS 2012 Tabular model performance change when you add more CPU sockets / NUMA nodes?
In my last post (SSAS Tabular NUMA and CPU Cores Performance) I presented the results of some testing I’d been doing on the scalability of the SSAS 2012 Tabular model. Specifically with the performance of distinct count measures over large data volumes (50-200m distinct customers).
The conclusion was that moving from 1 NUMA node (CPU socket) to 2 had no impact on query performance, so the 2nd CPU is entirely wasted. This actually contradicted other advice and recommendations that indicated that adding a second node would actually make the performance worse.
After discussing the issue with a member of the SSAS development team, they advised that the method I was using to disable cores was flawed, and that we shouldn’t be using Windows System Resource Manager. So I re-ran the tests disabling cores (and associated memory) using MSConfig, simulating a physical core removal from the server.
The test results were immediately different…
The hardware setup was the same as before, but with a larger data set:
- 30Gb SSAS tabular cube, running on a 2 x CPU 32 core (Xeon E5-2650 2Ghz, 2 x NUMA nodes, hyperthreaded) server with 144Gb RAM
- SQL Server 2012 SP1 CU8 Enterprise (+ a further hotfix that resolves a problem with distinct counts >2m)
- 900m rows of data in primary fact
- 200m distinct CustomerKey values in primary fact
- No cube partitioning
- DefaultSegmentRowCount: 2097152
- ProcessingTimeboxSecPerMRow: 0
- CPU cores and associated memory disabled using MSConfig
The two test queries were
- Query 1: Simple, single value result of the total distinct customer count
- Query 2: More complex distinct count query, sliced by a number of different attributes to give approx 600 result cells
As soon as the cores are increased above 16 (i.e. the 2nd CPU is introduced), the queries take 1.45x and 2x the time to run. Query performance drops significantly. The simple query takes almost exactly double the time.
These results now support other theories floating around the blogosphere, that adding extra CPUs not only doesn’t help the tabular performance, it actually significantly hinders it.
As before, the default segment count setting gave the best performance at 2m and 4m. Raising it seemed to degrade performance.
Frog Blog out
Hi Alex,
i have read both your articles and getting stuck and unable to determine “why”
Server Layout/Details:
****QA****
1 node cluster
2 SSAS(2012) Intances – TAB & AS
4proc*12core=48 or (0-47)
Dell PowerEdgeM910 — E7540 @2.00GHz with 225Gb RAM
OS-08R2 Ent
History:
BI Team came to a conclusion after talking to Marco Russo (thank you very much sir for your time) that the mdx query would need to run on 1 proc or NumNode0. They saw a huge difference on query execution from 7.3 sec to ~4sec. This was achi
Test:
A)BaseLine Test
1)Clear Cache(xmla)
2)Ran mdx => ~7+sec
B)ChangeAffinity Test
1)Connected to TAB instance(rdp)
2)Star Task manager
3) Identify process for TAB; right click Set Affinity
4) Uncheck
5) Selected CPU 0-11(Node0)
6)Clear Cache (xmla)
7)Ran mdx => ~3.6->4.2sec
C)WSRM Test (http://blogs.msdn.com/b/sqlcat/archive/2013/11/05/forcing-numa-node-affinity-for-analysis-services-tabular-databases.aspx)
1)Followed steps 2-4
2)Brought down /up TAB resource offline
3)Clear cache(xmla)
2)Ran mdx => ~7+sec
Any direction/help would be appreciated
Next Q would be how would we use MSConfig in this scenario?
Thanks in advance!
AK
Hi AK
Firstly the problem with NUMA cannot just be resolved by setting the processor affinity, as the problem is actually memory. With a NUMA architecture, each CPU is allocated its own share of RAM. If you set the processor affinity of a process then performance will still degrade significantly if that CPU tries to access RAM that is allocated to a different CPU.
Secondly, there are reports of people getting WSRM to work, but in my experience it was not successful. It looked like it worked, but performance tests showed that it didn’t – the same as you’re seeing. The only methods advised by the SSAS development team that I spoke to was to physically remove the CPU, or use MSConfig to disable all CPUs (apart from the first) and associated memory. This is as close as you can get to physical CPU removal.
There seems to be no reliable, supported way of setting the NUMA affinity in the Tabular model, so you should only run Tabular on a single socket server. I expect this to change in the future, as there are plenty of core affinity settings in SSAS tabular, they’re just not implemented yet. And hopefully Tabular will become NUMA aware before too long.
To use MSconfig.exe, click on the Boot tab, then click ‘Advanced Options’. Set the ‘Number of Processors’ to the number of cores (threads) available in CPU0, and the ‘Maximum memory’ to the RAM allocated to that CPU (i.e. Total RAM divided by the number of CPUs).
Then reboot, and Windows will only see what you’ve allowed it to see.
Alex
Alex is right, you cannot *guarantee* that NUMA affinity will work. Usually it works because SSAS is a Windows Process, and Windows try to allocate memory on the same NUMA node where the thread requesting allocation is running. If you have enough free memory, it works, but it is important that you set the affinity mask *before* the process start allocating memory.
So you should try to restart SSAS service and *before* anyone connects, set the affinity mask of the process.
Using a virtual machine with Hyper-V seems a more reliable way to guarantee you allocate memory on the same node.
Reading AK test, I suspect that test C has been done without restarting the service, but just detaching/attaching the database. This could be the reason why memory was used on another node.
Marco
I’m a NUMA-Noob.
I’m wondering why the option of going to the BIOS and disabling NUMA is not tried / discussed. Before I ask my IT department to do that for me, I’d like to know if there is some ‘obvious’ reason not to do that as one of the test scenarios.
I have a server* dedicated to one instance of SSAS Tabular: one socket, Intel E5-2680; 8 cores, 16 threads. Task Manager shows it as having 8 NUMA nodes. My understanding was that each *socket* represents one NUMA node, not each *core*…so there’s my first confusion.
I’d also like opinions as to whether/when there might be a benefit to disabling NUMA, so I don’t open a useless Helpdesk ticket.
Thank you in advance for any insights or clarifications.
Chris
* Hosted in VMWare
Windows Server 2012 R2
48 GB RAM
1 x Intel E5-2680
Hi Chris.
Good question. I left it out of the post as the test results were not very revealing. I found an average 8% drop in performance when disabling the NUMA setting.
Task Manager, by default, will show you one chart per core/thread, not one per NUMA. You can change this to a per numa display by right clicking.
If you are genuinely seeing 8 numa nodes then someone may have been playing around with your kGroup settings…
Alex
I should add though, that your environment, test data, cube structure and test cases etc. will be different to mine, therefore please do your own performance testing!
If you find anything interesting, please do post a follow up comment here.
Chris,
maybe that the VMware hosting is the root of the 8 socket representation in Task Manager. However, I remember a customer hosted in VMware that didn’t see that. I don’t know more because I’m not in touch with that customer now.
Marco
Alex – good advice. All I need now is admin rights to the VMware server –
Marco – Correct! Apparently VMware’s vNUMA, by default, gives every *core* a virtual socket.
http://blogs.vmware.com/vsphere/2013/10/does-corespersocket-affect-performance.html?utm_source=longwhiteclouds
Don’t know if this is good or bad. Testing to follow.
Thank you to both of you!
Chris
I realize this is an old thread, but did someone have a chance to measure the impact of NUMA on tabular SSAS database processing performance? If so, what was the test setup like and what were the results? All the benchmarks published online seem to focus on querying rather than processing performance. Thanks!
Hi Vitaly
I didn’t measure processing performance, but as Tabular doesn’t support parallel processing of partitions it’s unlikely that NUMA would make much difference.
However from SQL 2016 Tabular will support parallel processing of partitions, so it may make a difference in the future.
If you find any results from your own testing, please do share them as I’d be interested to hear.
Regards
Alex
Thanks Alex. Partitions from different tables are still processed in parallel in SSAS 2012 Tabular. With a sufficiently high number of partitioned tables in the model the impact of faster parallel processing may add up. We will try to do some benchmarks internally and see where the data leads.
I appreciate the pointer regarding parallel same-table partition processing being added to SSAS 2016. This is great news and it’s a long overdue feature. Now Microsoft just needs to make the SSAS Tabular engine NUMA-aware so we can stop having to use complicated workarounds and get better scalability on modern hardware.
Yes that’s a very good point, and I suppose even with a single table being processed on a single thread, if the memory used happens to belong to a different NUMA node then there would be performance degradation.
So I’ll revise my earlier comment and say ‘possibly’!
So I have read this about 5 or 6 six times and need some assistance. We are experiencing very poor performance in a VMWare environment and currently working MS Support to troubleshoot. They indicated the vm was undersized (of course) based on profiler traces etc. We are currently at the following config:
4 Virtual sockets at 2 cores per giving 8 Virtual CPU’s.
32GB RAM
After reading this should I request the VM staff to set to the following:
1 virtual socket and 16 cores? Sorry I am a bit of a noob here as well with SSAS Tabular on VMWare. Is there an article somewhere specific to VMWare config and SSAS Tabular?
Thanks
Mike
Hi Mike
If it were a physical server then I would definitely say yes you’d want a single socket with 16 cores. However VMWare may muddy the waters somewhat, it depends how VMWare handles NUMA and I’m not an expert on this I’m afraid, I always avoid VMs for SSAS due to performance deg.
I’d just tell the VM guys that you need 16 cores on a guaranteed single NUMA node. Then let them worry about how to make that happen.
Regards
Alex
Does SQL 2016 version add numa awareness to the tabular models? I can’t seem to find an answer anywhere online.
Unfortunately not, as far as I know SSAS 2016 Tabular is still not NUMA aware.
Hi Alex. Finally, SSAS is NUMA aware. http://byobi.com/blog/2016/11/analysis-services-tabular-is-finally-numa-aware/
This is huge. Time to update your post. (:
SSAS 2016 Tabular is now NUMA aware.
It would be awesome if you could run your tests again?