Recently I’ve been using LogParser a lot to import log files, such as W3C logs from an IIS web server. The distasteful format of these files is made palatable by the awesomeness that is LogParser; it just takes care fo it for you. (Check out this SQLBits session on LogParser by Jonathan Allen (Twitter | Blog) for a good intro to LogParser)
However I’ve been suffering with very poor performance with large files, so started to investigate further.
Exporting a 5Mb IIS log to a csv file was taking 0.4 seconds, whereas exporting the same log to SQL Server was taking over 15 times longer. This isn’t a problem for a small file, but processing a 0.5Gb log file is somewhat of a different matter!
Performing a trace on SQL Server during an import was revealing that each record being inserted was wrapped up in its own transaction. Importing 100,000 records into SQL was generating 100,000 insert statements, and 100,000 transactions. In the trace below you can see the transactions either side of the insert (using exec sp_cursor…).
This was executed using the following command
- LogParser.exe “SELECT * INTO staging.test FROM Test.log” -i:IISW3C -o:sql -server:localhost -database:Staging -cleartable:ON
I have to admit that I’m surprised by this – I assumed that LogParser would open up a single transaction for the bulk insert. It turns out however that you can control this behaviour using LogParser’s transactionRowCount parameter.
The ‘-transactionRowCount’ parameter defaults to 0, which auto commits every single row. It can be changed to any number you like, or to -1, which will commit all rows in a single transaction. The choice is yours.
By changing this parameter, the performance difference can be huge on a large file.
The following is an extract of the trace on the same file when executed using the following command
- LogParser.exe “SELECT * INTO staging.test FROM Test.log” -i:IISW3C -o:sql -server:localhost -database:Staging -cleartable:ON -transactionRowCount:-1
You can clearly see that only a single transaction is opened, with all inserts batched together.
For processing large files this can make a very significant different. For an 11Mb IIS log with just over 18k records on one of my test servers:
- Default behaviour: 38.26 seconds
- Single transaction: 2.92 seconds
This is a performance improvement of 92% – which you can’t grumble about.
If you’re calling LogParser from C# script within SSIS (as I usually do) then you can set this same behaviour using the transactionRowCount property of the LogQueryClassClass object (before you call ExecuteBatch!).