0845 643 64 63

SQL Server

1 2 3 5

Azure Synapse Series: Hash Distribution and Shuffle

For this post I’m going to presume you’ve already taken a look at distributing your data using a hash column, and you’re not experiencing the performance you’re expecting. (If you’re not already aware of what this is, take a look at the following link to learn the basics of what a distributed table is and why you need it in Azure Synapse. I’ll be here when you get back.)

https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-distribute

So now that we’re on the same page, one of the common issues encountered is trying to understand what Hash Distribution actually means and how it functions? In basic terms the column you choose to distribute by gets converted into a hash using a deterministic hash function, which creates the same value for any identical values passed. This places different rows of data on the same compute node, where the column(s) you have used to hash by match. This increases query performance and stops data being passed between Compute Nodes, massively increasing the performance of queries.

Great! So using the knowledge above I’ve created two tables, using a hash distribution and the data that I’m hashing by is the same value in each table. Let’s run an estimated execution plan to see how the joins work!

Shuffle 100%?

Oh dear. Shuffle. On the surface, shuffle is not necessarily a bad thing and may even be expected, but not in this scenario. So what is shuffle and why can it be bad?

Shuffle occurs when a part of a distributed table is moved to a different node during query execution. To do this a hash value is computed using the join columns, the node is then found that has that hash value and the row is then sent to that node for processing. If the tables you’re joining on have different hash distributions, you’d expect this to happen, (hence why it’s not always a bad thing!), but in this case I’ve used the same value to hash both of my tables, so what’s gone wrong?

Two words: Data Types

In my query below I have created two variables of the same data type (varchar(50)), given them the same text value, (Test Hash Text), and then run the hashing function of SHA2_256 over them to see the results. Running this query against your database will provide you with the same result every time when using the same hashing function.

declare @varchar1 varchar(50), @varchar2 varchar(50)
set @varchar1 = ‘Test Hash Text’
set @varchar2 = ‘Test Hash Text’

select @varchar1 as [Value], hashbytes(‘SHA2_256’,@varchar1) as [Hash]
union all
select @varchar2 as [Value], hashbytes(‘SHA2_256’,@varchar2) as [Hash]

As you can see here, the resulting Hash is the same for both text values.

Matching Hash = Good

But what if I change the data type? Here I’ve added a third variable with a nvarchar(50) data type instead, but with the same text value.

declare @varchar1 varchar(50), @varchar2 varchar(50), @nvarchar nvarchar(50)
set @varchar1 = ‘Test Hash Text’
set @varchar2 = ‘Test Hash Text’
set @nvarchar = ‘Test Hash Text’

select @varchar1 as [Value], hashbytes(‘SHA2_256’,@varchar1) as [Hash]
union all
select @varchar2 as [Value], hashbytes(‘SHA2_256’,@varchar2) as [Hash]
union all
select @nvarchar as [Value], hashbytes(‘SHA2_256’,@nvarchar) as [Hash]

As you can see here, the resulting Hash is different for the different data type even though the value is the same.

Different Hash = Bad

So what does this mean for our Synapse Distribution? Even if you’re distributing your tables on the same values, if you have a data type mismatch, you’re going to end up with a different Hash and an almost guaranteed different Node for your data. (Due to randomness, you may end up on the same Node by chance as there are a limited number of 60 Nodes, but not for the majority of your data).

To fix this, create a new computed column in your table in Synapse that has the same data type that you want to use across all tables using this same column, and Hash Distribute by that new column. The easiest way to do this is using the Create Table as Select (CTAS) command to create the new table with all of the data and a new data type. Add the new column in the select statement with a CAST/CONVERT to correct data type and use that for Hash Distribution. Additional reading on CTAS available here:

https://docs.microsoft.com/en-us/sql/t-sql/statements/create-table-as-select-azure-sql-data-warehouse

So I’ve now put this in place and what does my execution plan look like?

No Shuffle!

That’s much better! No shuffle where there shouldn’t be one and query times against this large amount of data take <5% of the time they took before.

Any questions? Let me know in the comments below or contact me on twitter: http://www.twitter.com/PurpleFrogReiss

ADF Switch Activity – A neat solution for nested IFs.

Whilst carrying out some work for a client using Azure Data Factory I was presented with the challenge of triggering different activities depending on the result of a stored procedure.

In fact the challenge posed was to…

  • Execute ‘Copy A’ activity if the result of a stored procedure returned (A),
  • Execute ‘Copy B’ activity if the result of a stored procedure returned (B),
  • Execute ‘Copy C’ activity if the result of a stored procedure returned (C)
  • Finally execute an error stored procedure if the result of the stored procedure returned (D).

Prior to the switch statement I could achieve this using 4 ‘IF’ activities connected to a lookup activity as shown in the snip below using my ‘Wait’ example pipeline.

However a neater solution is to use the ‘Switch’ activity to do this work instead. I’ll now jump straight into a worked example to show you how I achieved this.

I created a demo.RandomNumberGenerator stored procedure in our Purple Frog Demo database which declares a variable and sets it equal to a random number between 1 and 4. The stored procedure then uses this variable in a case statement to return a string. So if the @randomnumber variable returns 1 the case statement returns ‘Wait 1 Minute’.

I then used a lookup activity in ADF to return the result of this stored procedure which I then used to ‘feed’ my ‘Switch’ activity.

When I preview this lookup activity it just returns the result of the stored procedure. When this is run time and time again it just returns a random ‘ADFAction’ as named in my case statement generated by my rand() SQL function. So in the scenario above it returned an action of ‘Wait 1 Minute’. If I were to preview the stored procedure again it might return ‘Wait 4 Minutes’ instead.

I can then connect my switch activity to this lookup activity using the ‘success’ green output connector.

I now need to add some dynamic content to refer to the output of my lookup activity. I don’t need to create dynamic nested IF content to say if my stored procedure equals ‘Wait 1 Minute’ then do this, if my stored procedure equals ‘Wait 2 Minutes’ then do this… all I need to do is refer to the output of the connected lookup activity ‘Random Number Lookup Activity’.

So in my example this is simply just @activity(‘Random Number Lookup Activity’).output.firstrow.adfaction. ADF will then match the output of the stored procedure to the case statements I provide it. The default activity is just a set of activities that are executed when the expression evaluation isn’t satisfied. I.e. the output of my the stored procedure doesn’t match any of the case statements I provide it.

In the snip below I have provided 4 case statements which match all of the possible outputs from my stored procedure. Essentially there isn’t any need for default activity as my stored procedure will always return 1 of 4 results but it’s useful to know for future reference.

Within each case statement there is an activity for which I would like ADF to perform if it matches the result of my stored procedure. In my example these are just ‘Wait’ activities. So if the stored procedure returns ‘Wait 1 Minute’ I’ve put a ‘Wait’ activity within the case statement to wait 1 minute (or 60 seconds).

When I first ran this pipeline in debug mode I could see that the output of the stored procedure was ‘Wait 2 Minutes’. The switch activity then matched this to the case statements I provided and performed the activity within the ‘Wait 2 Minutes’ case statement and triggered my ‘Wait 2 Minutes’ activity.

When I debugged another run my stored procedure returned ‘Wait 4 Minutes’. The switch activity then executed my ‘Wait 4 Minutes’ activity. Perfect!

So it’s pretty much as simple that!

I hope this helps all you ADF’ers out there!

My experience in obtaining an MCSE

Background

I have recently completed my MCSE in Data Management and Analytics, and I wanted to share my experience of working towards and passing the exams that have led to me getting this MCSE. In this post I will cover some of the challenges I faced and provide some tips and advice, hopefully others following the same certification path, or other paths, will find this useful.

I am a business intelligence developer at Purple Frog, I have several years of experience working with data, I started off as a data analyst and then went into reporting and more recently have been working on ETLs, data warehousing and cubes. I have been working with SQL throughout my various roles and therefore the best place for me to start my certification path was with the exam 70-761: Querying Data with Transact-SQL. The path I’ve taken up to this point is:

MCSA: SQL 2016 Database Development
– 70-761: Querying Data with Transact-SQL
– 70-762: Developing SQL Databases

MCSE: Data Management & Analytics
– 70-767: Implementing a SQL Data Warehouse (from the Elective Exam Pool)

The learning material

Although the first exam (70-761) fitted in quite well with my SQL background (I probably knew about 75% of the material beforehand), there was still some work required for me to get to the stage where I felt I was confident in passing the exam. For me, the best resource and my primary resource for learning the material has been the Exam Ref books, so for example “Exam Ref 70-761 Querying Data with Transact-SQL”. These books are structured in a way that the content is split into the sections covered by the exam, for example the books contain a chapter for every skill covered in the exam.

The one downside to the Exam Ref books is that at times it can feel quite wordy if you’re relying on the book alone, so what I found really useful was to supplement this with videos and demos on the topics where I needed a greater understanding. In addition to this, practice and doing exercises helped me to further understand the different concepts as I was able to try what I had learnt and see where I was going wrong.

The final resource that I found useful was Microsoft Docs (https://docs.microsoft.com/en-us/), this is a really good reference point for key facts, for example I found the page on CDC really useful for my latest exam (70-767).

The exam

There are the obvious tips such as sleep early the night before, get to the exam centre with time to spare and so on, but I wanted to share some of the exam techniques I found useful while doing these exams.

My top tip is check that you have covered off and are comfortable with all the skills measured in the exam, the skills measured for each exam can be found in the “Browse Certifications and Exams” section on the Microsoft Learning website (example shown below for exam 70-761). The skills are also stated in the Exam Ref books and as mentioned before the chapters in the book are based on the skills measured in the exam.

This was taken from https://docs.microsoft.com/en-us/learn/certifications/exams/70-761

What’s useful about the skills measured shown above is that it shows the weight of questions per skill in the exam. This is useful because you can work out if you need to focus on a weaker area if that area is a big part of the exam.

Time shouldn’t be an issue in the exam if you’ve prepared well, however some questions are not worded in the best way and can catch you out so do take the time to read each question properly, and do keep an eye on the time remaining after every 5-10 questions.

You have the option to flag questions and review them again later (note some questions cannot be flagged), make use of these flags for questions you are unsure of. This can be particularly useful if you’ve flagged a question and then a later question gives you a clue or reminds you of the answer for the question flagged earlier. Alternatively, you should be provided with a pen and wipeable board where you can make notes so note down the question number and topic so that you can come back to it later.

Next steps

I am currently studying towards the exam 70-768: Developing SQL Data Models, this will help develop my understanding and knowledge of data modelling and working with cubes and will also help me get the certification for MCSA: SQL 2016 BI Development. With these current certifications being retired in the near future the next plan is to work towards the certification Microsoft Certified: Azure Data Engineer Associate.

I hope you have found this blog useful and that you can use some of the tips mentioned in your study plans, all the best!

SQL LAST_VALUE() wrong results

You may have come across the following scenario when using the LAST_VALUE() function.

You want to find the first or last value loaded in a result set, therefore you use the FIRST_VALUE() and LAST_VALUE() functions like below:

You expect to get “A” for every record in the FirstValue column and “E” for every record in the LastValue column.
However you see the following output:

LAST_VALUE() on its own is implemented from the current row going back to the first row, for example:

  • Row 1 – The only value in scope is A.
    Hence, FIRST_VALUE() & LAST_VALUE() both return A.
  • Row 2 – The values now in scope are A & B.
    Hence, FIRST_VALUE() returns A and LAST_VALUE() returns B.
  • Row 3 – The values now in scope are A, B, C.
    Hence, FIRST_VALUE() returns A and LAST_VALUE() returns C.
  • And so on…

To return the actual last value, add the additional clause ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING. This ensures that for each row the LAST_VALUE() function looks at all rows; from the very first row to the very last row. Now, you have the following code (the original code with an extra column using the additional clause):

This gives you the following output, LastValue2 shows the true last value for the result set:

SQL Server 2016 Licensing Changes, SP1, v.Next CTP1, etc.

connect2016By now you’ve probably heard the exciting news announced at today’s Microsoft Connect conference; SQL Server licensing is changing, significantly, from SQL 2016 SP1.

The functionality of SQL Server Standard, Web & Express editions are being brought up to [almost, 99%] match that of Enterprise Edition. This has a number of critical impacts:

  1. Developers can write applications using a single set of SQL Server functionality, and can now make use of previously Enterprise only features without imposing the cost on their customers.
  2. The customer now decides whether to use Express, Web, Std or Ent based on scaleability, not functionality.
  3. It follows the pricing model already used by most cloud services; pay for usage rather than features.
  4. This brings security to the masses – possibly one of the biggest impacts is that now we can use Row Level Security, Dynamic Data Masking and Always Encrypted in ALL editions of the engine – huge news.

sqlserverThere are some limitations to this though:

  1. It currently only applies to the SQL engine, not to SSIS, SSAS or SSRS – Please Microsoft, follow suit with the BI tools asap!!!
  2. Some features are not [yet] available in the Express and/or Local DB editions, including Change Data Capture, In Memory OLTP and Polybase to name a few.
  3. The trigger to move to Enterprise will now be [almost] purely driven by size and performance requirements.

As if that isn’t enough, other huge announcements include:

  • SQL Server 2016 SP1 is now available
  • New functionality in 2016 SP1, including
    • CREATE OR ALTER functionality for Stored Procs, Views, Functions, etc.
    • DBCC CLONEDATABASE – To easily clone a database schema without the data (love this!!)
    • NUMA support for Analysis Services Tabular – I’ll be checking this out in more detail soon and will blog my findings
  • SQL Server v.Next CTP1 (Community Technology Preview) is now available, which includes SQL Server running on Linux! Note that only the engine is available on Linux at this time, not the BI tools but watch this space, the plan is to offer matching functionality on both

 

For full details, check out the Microsoft blog post here

/Frog-Blog-Out

Excel doesn’t open CSV files correctly from sp_send_dbmail

A nice little nugget today for anyone who uses SQL dbmail to send data in csv attachments.

Sometimes, depending on your language settings, CSV files don’t open correctly in Excel. Excel ignores the ‘,’ and puts all data in column A. You then have to go through the hassle of closing the file and opening it throug the flat file wizard.

 

excelcsv1

There’s a very nice solution to this that I found in this thread. It’s to insert a new row in the top of the file containing “sep=,”. This forces Excel to understand that it is a comma delimited file, and ensures that it will open correctly.

The only problem, how do you insert a header line when using dbmail?…

[Caution – ugly hack alert…]

You can alter the name of the first column to include this header text. We simply rename “Column1” to “sep=,{CR}{LF}Column1”. Then when dbmail prints out the column headers in the file, the Column1 name will be split on two lines, preceeded by “sep=,”.

excelcsv2

Excel treats this first row as an instruction and does not display it, just uses it to make sure it formats the data correctly.

So how do we do this in SQL?

DECLARE @Column1Name VARCHAR(255)
DECLARE @Query VARCHAR(2048)
SET @Column1Name = '[sep=,' + CHAR(13) + CHAR(10) + 'Column1]'
SET @Query = 'SELECT Column1 AS ' + @Column1Name + ', Column2, Column3 FROM myTable'

EXEC msdb.dbo.sp_send_dbmail
@profile_name=XXX
,[Other parameters as required]
,@query=@Query
,@attach_query_result_as_file=1
,@query_attachment_filename='QueryResults.csv'
,@query_result_separator=',' --enforce csv
,@query_result_no_padding=1 --trim
,@query_result_width=32767  --stop wordwrap

And there you go. It’s a bit of a hack, but works well.

Frog-Blog-Out

Analysis Services Tabular or Multidimensional? A performance comparison

molapviolinsummarychartCan SSAS Multidimensional be faster than SSAS Tabular for distinct counts on large datasets?

We’ve all seen how fast the SSAS Tabular model can work – it’s an in-memory, heavily compressed analytics database. However you may have seen my previous posts on Tabular and NUMA, which show that at the moment Tabular is limited to a single CPU. This obviously limits its scalability and maximum performance.

The situation

A large project I’m working on, for a multinational telecoms provider, has a problem with the query performance with their current Tabular model. It’s fine for smaller datasets but does not scale well, specifically with distinct count operations. The distinct count is the single biggest critical factor for performance in this project. With in the region of 50m distinct customers, this is a sizeable problem. After working with Microsoft to investigate the Tabular performance we identified a bug in SSAS, which has been resolved in SQL 2012 SP1 CU9, but still doesn’t work fast enough on the limited hardware.

The Multi-dimensional model is NUMA aware (Although how well it is supported depends on your workload), so can make better use of hardware, but we all know how poorly MOLAP scales with distinct counts, due to the storage engine IO bottleneck. So what is the answer?

What about Solid State?

The reason the multidimensional model often experiences performance degredation for distinct count operations is the heavy disk IO required. Aggregations can’t be used effectively, so the SSAS storage engine has to work through the entire dataset on disk. With the speed of normal disks, especially when on a SAN, this is not great.

There are now a number of solid state storage options available that provide enterprise grade storage with some very impressive performance.

So the question is, can a multidimensional cube running on solid state storage outperform an equivalent in-memory Tabular cube?

The tests

I ran various query performance tests with (MOLAP) cubes from 0.5Tb to 2Tb. The results in this post are focusing entirely on the 0.5Tb tests, with an identical dataset and equivalent test queries on a Tabular cube.

A big thank you to to the following:

  • Millicom – for the test dataset, and for allowing me to publish the tests results here
  • Violin – for the use of a storage array for the test
  • Microsoft – for the loan of a server in the MTC (Microsoft Technology Center) in Reading.

The equipment for the multidimensional cube:violinarray

  • HPDL580
  • 4 x Intel Xeon E7-4850 2Ghz
  • 40 cores, hyperthreaded to 80
  • 256Gb RAM, 1067Mhz
  • Violin 6616 SLC, 1m IOPS, 4Gbps
  • SQL Server 2012 SP1 CU9
  • Distinct count measure groups matrix partitioned by month, and 60 non-overlapping buckets of CustomerKey
  • Basic usage based optimisation applied to aggregations. There is a high liklihood this could be improved further

sqlserverThe equipment for the Tabular cube

Note that the different number of CPUs is fair, Tabular is not able to benefit from multiple CPUs, Multidimensional is. The factors that are unequal are the threads per CPU (16 Vs 20) and memory speed (666Mhz Vs 1067Mhz), please bear this in mind when looking at the performance comparison results.

The dataset is 6 monthly snapshots of 60m customers in two primary fact tables with 300m and 1.4bn rows in each.

The 7 query tests cover a range of workloads from very simple to very complex. Queries 1&2 do not include distinct count measures, all other queries include a distinct customer count from one of the primary fact tables, with varying levels of complexity, granularity and result set size. Tests 1-4 are manually created MDX queries, 5-7 are real-world Excel workbooks provided by users, each containing 1-6 large pivot tables.

Each test was run on a cold SSAS cache, and with the windows file cache also cleared before every test. Every test was run 3 times, with the average taken.

Test scenarios:

  • Tabular, using the fastest performance measured from all previous configuration tests
  • Multidimensional, using all CPUs and local Raid 5 disk storage
  • Multidimensional, using a single CPU (single NUMA node) and Violin storage
  • Multidimensional, using all CPUs and Violin storage

The Results

molapviolinresulttable

And when plotted on a chart…

molapviolinresultchart

The first 3 tests are so quick that they don’t show up on the above chart, but you can see these results in the table above.

Conclusion

It’s clear from these tests that solid state storage can breathe new life into an IO constrained multidimensional cube. This really applies to any case where you’re making heavy use of distinct counts over a large, high cardinality dataset. In this case the average performance gain over the same disk based cube was 2.6x.

Multidimensional on solid state storage can be on average 1.9x faster than the Tabular model, when working with large high cardinality distinct count operations.

Although other research and evidence suggests that Multidimensional cubes don’t perform well with multiple NUMA nodes, these tests show that if your workload is very heavily constrained by the storage engine, you can gain significant performance from multiple CPUs/NUMA nodes.

And Finally…

As well as basic timing tests using Profiler, I also monitored the CPU and disk throughput in Perfmon. The following two screenshots show the difference between the multidimensional cube on disk and then the same cube on Violin. This is for test 4, which showed the most significant performance improvement from disk to solid state.

perfmondisk

perfmonviolin

These traces show test 4, which is has the following query structure:

SELECT {   [Measures].[Fully Additive Measure 1]
         , [Measures].[Fully Additive Measure 2]
         , [Measures].[Fully Additive Measure 3]
         , [Measures].[Distinct Measure 4]} ON COLUMNS ,
NON EMPTY {[Dimension 1].[Attribute 1].[Attribute 1].MEMBERS  --6 members
         * [Dimension 2].[Attribute 2].[Attribute 2].MEMBERS  --12 members
         * [Dimention 3].[Attribute 3].[Attribute 3].MEMBERS  --5 members
         * DESCENDANTS ([Date].[Calendar].[Year].&[2013], [Date].[Calendar].[Month])} ON ROWS  --6 members
FROM [Cube]

This returns 700 distinct tuples from the crossjoin, with an average of 400k distinct customers per tuple.

This shows the potential benefit of maximising the storage IO throughput. In the first image, the IO throughput being constrained to under 100Mbps means that only 3% of the CPU can be utilised. In the second trace, by providing up to 1.7Gb per second throughput, the CPU is allowed to work at an average of 60% for the duration of the query, with a clear impact on the query performance.

These charts also shows the reduced IO latency (IO response time). With the average IO response time in the first trace showing between 40-60ms for the disk IO, and the barely registering on the scale of the chart for the Violin array. I found the IO response for the Violin tests was averaging at 450μsec. The SLC unit runs at ~200μsec latency at the 4k chunk size, so the 8k chunks we’re asking for fully complete in the 450μsec that we saw.

There’s plenty more details and tests that I’m running, I may post further details if I get time. Until then…

Frog-Blog-Out

Update #1 06/06/2014:
In the interests of completeness and fairness, I subsequently re-ran the Tabular model performance tests on the same (Microsoft supplied) server as the MOLAP model. The only difference being that to maximise the performance I restricted the server to a single CPU/NUMA node.
Somewhat surprisingly, the performance difference between Tabular and MOLAP/Violin was even greater. Multidimensional performed 3.7x faster, compared with 1.9x faster on the original Tabular server.

Update #2 06/06/2014:
If you’re planning on implementing a large scale Tabular model, do take the above tests into consideration, however please be aware that there are reasons why the Tabular performance is poor here. The incredibly high cardinality of the distinct count field, and the data can’t be split up, remodelled or simplified. This will not apply to all data scenarios.
Please take a look at a recent white paper by Alberto Ferrari at SQLBI, in which he discusses a number of ways of optimising large distinct count data sets to suit the Tabular model. He’s achieved some very impressible results. Unfortunately these methods would not benefit the scenario described in this post, but it shows what can be done with Tabular if the data suits.
http://www.sqlbi.com/articles/using-tabular-models-in-a-large-scale-commercial-solution

LogParser Performance Inserting into SQL Server

Recently I’ve been using LogParser a lot to import log files, such as W3C logs from an IIS web server. The distasteful format of these files is made palatable by the awesomeness that is LogParser; it just takes care fo it for you. (Check out this SQLBits session on LogParser by Jonathan Allen (Twitter | Blog) for a good intro to LogParser)

However I’ve been suffering with very poor performance with large files, so started to investigate further.

Exporting a 5Mb IIS log to a csv file was taking 0.4 seconds, whereas exporting the same log to SQL Server was taking over 15 times longer. This isn’t a problem for a small file, but processing a 0.5Gb log file is somewhat of a different matter!

Performing a trace on SQL Server during an import was revealing that each record being inserted was wrapped up in its own transaction. Importing 100,000 records into SQL was generating 100,000 insert statements, and 100,000 transactions. In the trace below you can see the transactions either side of the insert (using exec sp_cursor…).

Log Parser Transactions

This was executed using the following command

  • LogParser.exe “SELECT * INTO staging.test FROM Test.log” -i:IISW3C -o:sql -server:localhost -database:Staging -cleartable:ON

 

I have to admit that I’m surprised by this – I assumed that LogParser would open up a single transaction for the bulk insert. It turns out however that you can control this behaviour using LogParser’s transactionRowCount parameter.

The ‘-transactionRowCount’ parameter defaults to 0, which auto commits every single row. It can be changed to any number you like, or to -1, which will commit all rows in a single transaction. The choice is yours.

By changing this parameter, the performance difference can be huge on a large file.

The following is an extract of the trace on the same file when executed using the following command

  • LogParser.exe “SELECT * INTO staging.test FROM Test.log” -i:IISW3C -o:sql -server:localhost -database:Staging -cleartable:ON -transactionRowCount:-1

Log Parser Single Transaction

You can clearly see that only a single transaction is opened, with all inserts batched together.

For processing large files this can make a very significant different. For an 11Mb IIS log with just over 18k records on one of my test servers:

  • Default behaviour: 38.26 seconds
  • Single transaction: 2.92 seconds

This is a performance improvement of 92% – which you can’t grumble about.

If you’re calling LogParser from C# script within SSIS (as I usually do) then you can set this same behaviour using the transactionRowCount property of the LogQueryClassClass object (before you call ExecuteBatch!).

 

Frog-Blog-Out

Database Management PhD survey

If you’re involved in database management, can you spare 20 minutes to complete a survey?

Victoria Holt is studying for her PhD, and would very much appreciate it if you could help her out with her survey.

Go on… Fill it in whilst you have a break over a cup of tea & a mince pie…

Full in the survey here

On behalf of Victoria, thanks

Rock Paper Scissors Lizard Spock

(Image courtesy of GraphJam.com)

I want to share with you some genius level work by Phil Quinn (blog | twitter), who is furthering the boundaries of SQL Server.

 

Phil has taken the spatial drawing code from an earlier Frog-blog and used it to create a fully functional Rock Paper Scissors Lizard Spock game, just using the power of SQL Server. Not even the brightest boffins at Microsoft could ever have imagined that SQL Server would be able to provide such an awesome service to the World.

 

Check out Phil’s post and download the query here

 

If you want to hear Phil speak, he’s doing a talk for SQL Midlands on 22nd November in Birmingham. His talk will be on using XML in SQL Server, but I’m hoping that we’ll also be treated to a bit of Rock Paper Scissors Lizard Spock action…

 

 

Frog-Blog-Out

1 2 3 5
Power BI Sentinel
The Frog Blog

Team Purple Frog specialise in designing and implementing Microsoft Data Analytics solutions, including Data Warehouses, Cubes, SQL Server, SSIS, ADF, SSAS, Power BI, MDX, DAX, Machine Learning and more.

This is a collection of thoughts, ramblings and ideas that we think would be useful to share.

Authors:

Alex Whittles
(MVP)
Reiss McSporran
Jeet Kainth
Jon Fletcher
Nick Edwards
Joe Billingham
Microsoft Gold Partner

Data Platform MVP

Power BI Sentinel
Frog Blog Out
twitter
rssicon