I’m currently investigating a poorly performing Tabular model, and came across some interesting test results which seem to contradict the advice in Microsoft’s Performance Tuning of Tabular Models white paper.
Some background:
- 7.6Gb SSAS tabular cube, running on a 2 x CPU 32 core (Xeon E5-2650 2Ghz, 2 x NUMA nodes) server with 144Gb RAM
- SQL Server 2012 SP1 CU7 Enterprise
- 167m rows of data in primary fact
- 80m distinct CustomerKey values in primary fact
- No cube partitioning
A simple distinct count in DAX of the CustomerKey, with no filtering, is taking 42 seconds on a cold cache. Far too slow for a tabular model. Hence the investigation.
p88 of the Performance Tuning of Tabular Models white paper discusses the DefaultSegmentRowCount, explaining that it defaults to 8m, and that there should be a correlation between the number of cores and the number of segments. [The number of segments calculated as the number of rows divided by the segment size].
It also indicates that a higher segment size may increase compression, and consequently query performance.
Calculating the number of segments for our data set, gives us the following options:
Rows | 167,000,000 |
Segment Size | # Segments |
1048576 | 169 |
2097152 | 80 |
4194304 | 40 |
[default] 8388608 | 20 |
16777216 | 10 |
33554432 | 5 |
67108864 | 3 |
So, with 32 cores to play with, we should be looking at the default segment size (8m) or maybe reduce it to 4m to get 40 segments. But the extra compression with 16m segment size may be of benefit. So I ran some timing tests on the distinct count measure, and the results are quite interesting.
It clearly shows that in this environment, reducing the DefaultSegmentRowSize property down to 2m improved the query performance (on a cold cache) from 42s down to 27s – 36% improvement. As well as this, processing time was reduced, as was compression.
This setting creates 80 segments, 2.5 times the number of cores available, but achieved the best performance. Note that the server’s ProcessingTimeboxSecPerMRow setting has been set to 0 to allow for maximum compression.
There’s more to this systems’s performance problems than just this, NUMA for a start, but thought I’d throw this out there in case anyone else is blindly following the performance tuning white paper without doing your own experimentation.
Each environment, data set and server spec is different, so if you need to eek out the last ounce of performance, run your own tests on the SSAS settings and see for yourself.
Frog-Blog Out
[Update: Follow up post exploring the performance impact of NUMA on this server]
Very interesting results! Thanks for the write up.
It would be interesting to retry those tests with Tabular affinitized to one NUMA node as the results may surprise you:
http://blogs.msdn.com/b/sqlcat/archive/2013/11/05/forcing-numa-node-affinity-for-analysis-services-tabular-databases.aspx
It would also be interesting to order your SQL query on CustomerKey to see if that helps improve compression (and consequently query performance).
Hi Greg
Thanks for the comment. I’m already sorting the source data by CustomerKey. I’ve tried with and without the sorting and didn’t notice any difference with this data set.
I’ve also been running tests on the NUMA affinitization over the last couple of days, with some more interesting results – I’ll post the summary here later.
Thanks
Alex
Hi Greg
I’ve added a new post with the NUMA test results (https://purplefrogsystems.com/2013/12/ssas-tabular-numa-and-cpu-cores-performance/).
Regards
Alex
Ok so the NUMA nodes tests I did were slightly flawed – updated results here (https://purplefrogsystems.com/2014/02/ssas-tabular-performance-numa-update/).
However even on the new tests I still found the best performance was using DefaultSegmentRowCount of 2m and 4m.
Maybe it’s just the specific test queries I’m running (entirely distinct count based) and the very high cardinality of the data I’m using.
Regards
Alex
I’ve been doing some testing (in the midst of it now):
I have a complex report, and changing from 16m segment size to 2m let the report run in 1/3 the time. MASSIVE reduction.
The larger segment size had seemed better on a single calculation.
However, this is perhaps related to distinctCount specifically? My calcs are using it quite extensively, the calc that showed better performance at larger segment size was not.
Yeah all of my testing was on distinct counts so that may well be the case. I’ve not done the same testing on normal measures, would be interesting to see what those results are…
You should measure the performance with SSAS2012 SP1 CU9,
The DISTINCTCOUNT measure improved in a significative way especially if the number of distinct values in a column is large (more than 1-2 million). This might have an impact on segment size, because a larger segment usually improve compression and performance, but you might have observed a strange situation because of the previous DISTINCTCOUNT performance issue.
Hi Marco
Funny you should mention that… That bug was found and fixed as a direct result of working with Microsoft support on this series of tests, whilst investigating the slow performance.
That fix definately does help distinct count queries where each individual result cell exceeds a value of 2m, unfortunately it doesn’t improve anything if your result cells are <2m.
I did perform some brief testing on CU8 + the hotfix, and the optimal segment was still 2m in my data set.
Of course, each dataset will behave differently, and should be tested rather than relying on one specific example on a blog 🙂
The world is small! 🙂
However, if you have 32 cores available, you should have at least 32 segments (Vertipaq uses one thread per segment). Looking at the cart in more detail, it’s “strange” that the size of the database increase with larger segments – but this might be related to the high number of distinct values you have in the CustomerKey column (80m) and to the distribution of other data in other segments. We have worked on a 4billion-row fact table where the larger dimension was much smaller (100k rows) and in that case the white paper indications work well.
The problem is that every database is different and parameters might vary depending on distribution of data.
BTW: have you tried processing the data using a particular sort order? I don’t think it would be beneficial for a 1:2 ratio between customers-transaction rows (1 Customer – 2 Transactions), but it could be useful to know the result if you tested it.
Thanks!
The source query for this did have the data sorted by the distinct count key, but I’m afraid I’ve not got any test results with it sorted any other way. I’d heard that sorting could make a difference (thanks to some helpful pointers from Mark Stacey), so I applied the sort before I started any meaningful testing.
If I can get time on the MTC test server, I’ll test different segment sizes again to see if the results change. The problem is too many tests to run, not enough time to do it!
Anyone have access to a cloning machine?…
Tried with this.Clone() but it didn’t worked, sorry 🙂
Ha ha – if you figure it out please let me know…
Just a comment on sort order: in my tests I see differences in resultant size, but at least for the distinct count tests (where I’ve been focusing), I haven’t seen what I’d consider significant query speed differences.
Hi.
I was making some testing over a distinct count measure on SSAS 2012 with latest Cumulative Update build 12.0.4427.24.
I created two test measures, one using a distinct count and another using a SUMX over a DISTINCT Set of values and adding 1.
Curiously the SUMX performs faster than distinct count either when for the distinctcount the FE is used at 95% and for the SUMX the FE and SE operations are 50% 50%…
Does anyone knows if this is an expected behavior? Can it be because of SegmentSize / Compression / fragmentation?
Thank you
Hello Guys,
I’m stating with this tabular cubes and I need to find out what are my hardware needs to run the server, and also i need to know how to troubleshot performances issues, does any of you know a good link or document that i can read? also a tool to figure out some metrics of the analysis server?
You might take a look at this article about hardware selection for SSAS Tabular.
I forgot the link: https://www.sqlbi.com/articles/choose-the-right-hardware-for-analysis-services-tabular/