Recently I’ve been using LogParser a lot to import log files, such as W3C logs from an IIS web server. The distasteful format of these files is made palatable by the awesomeness that is LogParser; it just takes care fo it for you. (Check out this SQLBits session on LogParser by Jonathan Allen (Twitter | Blog) for a good intro to LogParser)
However I’ve been suffering with very poor performance with large files, so started to investigate further.
Exporting a 5Mb IIS log to a csv file was taking 0.4 seconds, whereas exporting the same log to SQL Server was taking over 15 times longer. This isn’t a problem for a small file, but processing a 0.5Gb log file is somewhat of a different matter!
Performing a trace on SQL Server during an import was revealing that each record being inserted was wrapped up in its own transaction. Importing 100,000 records into SQL was generating 100,000 insert statements, and 100,000 transactions. In the trace below you can see the transactions either side of the insert (using exec sp_cursor…).
This was executed using the following command
- LogParser.exe “SELECT * INTO staging.test FROM Test.log” -i:IISW3C -o:sql -server:localhost -database:Staging -cleartable:ON
I have to admit that I’m surprised by this – I assumed that LogParser would open up a single transaction for the bulk insert. It turns out however that you can control this behaviour using LogParser’s transactionRowCount parameter.
The ‘-transactionRowCount’ parameter defaults to 0, which auto commits every single row. It can be changed to any number you like, or to -1, which will commit all rows in a single transaction. The choice is yours.
By changing this parameter, the performance difference can be huge on a large file.
The following is an extract of the trace on the same file when executed using the following command
- LogParser.exe “SELECT * INTO staging.test FROM Test.log” -i:IISW3C -o:sql -server:localhost -database:Staging -cleartable:ON -transactionRowCount:-1
You can clearly see that only a single transaction is opened, with all inserts batched together.
For processing large files this can make a very significant different. For an 11Mb IIS log with just over 18k records on one of my test servers:
- Default behaviour: 38.26 seconds
- Single transaction: 2.92 seconds
This is a performance improvement of 92% – which you can’t grumble about.
If you’re calling LogParser from C# script within SSIS (as I usually do) then you can set this same behaviour using the transactionRowCount property of the LogQueryClassClass object (before you call ExecuteBatch!).
Frog-Blog-Out
Hii, I am also using log parser to parse the event logs of a remote machine but transactionRowCount flag is not helping me in reducing the time. Using it or not is not making any difference. Can you help me out??
Hi Yogesh
Have you run a profile trace on the database to find out how many rows are being inserted per transaction?
This setting will help if it’s only inserting one row per transaction. So compare the trace with and without the setting and compare the results.
Regards
Alex
Yes, I have run the profile trace on both the settings and I am getting the same traces as shown in your 2nd screen shot. The only difference was SQL BatchStarting and SQL BatchCompleted trace on the both ends while using -transactionRowCount:-1
Adding that I am fetching the logs from a remote machine.
And 2 weeks ago, same query without transactionRowCount was taking approx. 7 minutes and with transactionRowCount:-1 approx. 48sec. But now both the queries are taking the same time 7 minutes.
Do you have any idea where am I making the mistake.
Query: logparser -i:EVT “select * into tableName from \\serverName\application” -o:SQL -driver:”SQL Server” -server:serverName1 -database:databaseName
TransactionRowCount flag works when I fetch the logs from my local machine. It reduces my time from 12 secs to 4 secs.
But still not getting why this flag is not helpful in remote fetching of logs.
Help me out. Thanks.!
I can’t think of any reason why this would cause you a problem, other than the raw throughput of data from your source.
The TransactionRowCount flag affects the insert into the destination, not the speed at which data can be received from the source.
So I’d recommend running some tests on the bandwidth or throughput, outside of LogParser.
Regards
Alex
Thanks for your help..!
Hi Alex,
I am using the below script in SSIS script task. When I run the package, I am getting the following error:
DTS Script Task has encountered an exception in user code:
Exception has been thrown by the target of an invocation.
Could you please let me know how can I fix this issue?
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
using MSUtil;
public void Main()
{
// TODO: Add your code here
ILogRecordset rsLP = null;
ILogRecord rowLP = null;
LogQueryClassClass LogParser = null;
COMTSVInputContextClassClass W3Clog = null;
string strSQL = null;
LogParser = new LogQueryClassClass();
W3Clog = new COMTSVInputContextClassClass();
strSQL = “SELECT * FROM D:\\access_201501080100.bak”;
// run the query against W3C log
rsLP = LogParser.Execute(strSQL, W3Clog);
rowLP = rsLP.getRecord();
//Unitsprocessed = rsLP.inputUnitsProcessed;
Dts.TaskResult = (int)ScriptResults.Success;
}
Difficult to spot where the problem is, so your best option is to add logging into the script so that you can see what the error message is.
Take a look at https://msdn.microsoft.com/en-us/library/ms136131.aspx for an example of how to do this.
Regards
Alex
I am trying to parse a 50MB log file using LogParser. It’s not a typical logfile that can be parsed with any of the logparser’s built-in input formats. It’s a programmatically generated WebLogic application log file.
logparser (without me specifying an input format, assumes “TEXTLINE”) and responds after 11 to 12 seconds for a query that searches for an id in the logfile.
With custom input format (-i:COM), it takes about 20 seconds for the same file (by using the same query).
Do you have any suggestions as to how the performance can be improved? Clearly, 20 or 12 seconds isn’t good.
Try using the NCSA format to interpret the file
Regards, Alex