How does the SSAS 2012 Tabular model performance change when you add more CPU sockets / NUMA nodes?

In my last post (SSAS Tabular NUMA and CPU Cores Performance) I presented the results of some testing I’d been doing on the scalability of the SSAS 2012 Tabular model. Specifically with the performance of distinct count measures over large data volumes (50-200m distinct customers).

The conclusion was that moving from 1 NUMA node (CPU socket) to 2 had no impact on query performance, so the 2nd CPU is entirely wasted. This actually contradicted other advice and recommendations that indicated that adding a second node would actually make the performance worse.

After discussing the issue with a member of the SSAS development team, they advised that the method I was using to disable cores was flawed, and that we shouldn’t be using Windows System Resource Manager. So I re-ran the tests disabling cores (and associated memory) using MSConfig, simulating a physical core removal from the server.

The test results were immediately different…

TabularNUMACoresTestThe hardware setup was the same as before, but with a larger data set:

  • 30Gb SSAS tabular cube, running on a 2 x CPU 32 core (Xeon E5-2650 2Ghz, 2 x NUMA nodes, hyperthreaded) server with 144Gb RAM
  • SQL Server 2012 SP1 CU8 Enterprise (+ a further hotfix that resolves a problem with distinct counts >2m)
  • 900m rows of data in primary fact
  • 200m distinct CustomerKey values in primary fact
  • No cube partitioning
  • DefaultSegmentRowCount: 2097152
  • ProcessingTimeboxSecPerMRow: 0
  • CPU cores and associated memory disabled using MSConfig

The two test queries were

  • Query 1: Simple, single value result of the total distinct customer count
  • Query 2: More complex distinct count query, sliced by a number of different attributes to give approx 600 result cells

As soon as the cores are increased above 16 (i.e. the 2nd CPU is introduced), the queries take 1.45x and 2x the time to run. Query performance drops significantly. The simple query takes almost exactly double the time.

These results now support other theories floating around the blogosphere, that adding extra CPUs not only doesn’t help the tabular performance, it actually significantly hinders it.

As before, the default segment count setting gave the best performance at 2m and 4m. Raising it seemed to degrade performance.

Frog Blog out

Tags: , , ,