<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Purple Frog Systems &#187; Integration Services</title>
	<atom:link href="http://www.purplefrogsystems.com/blog/category/integration-services/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.purplefrogsystems.com/blog</link>
	<description>Business Intelligence Consultancy</description>
	<lastBuildDate>Tue, 15 May 2012 11:49:31 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Automating T-SQL Merge to load Dimensions (SCD)</title>
		<link>http://www.purplefrogsystems.com/blog/2012/04/automating-t-sql-merge-to-load-dimensions-scd/</link>
		<comments>http://www.purplefrogsystems.com/blog/2012/04/automating-t-sql-merge-to-load-dimensions-scd/#comments</comments>
		<pubDate>Fri, 06 Apr 2012 10:55:38 +0000</pubDate>
		<dc:creator>Alex</dc:creator>
				<category><![CDATA[Business Intelligence]]></category>
		<category><![CDATA[Integration Services]]></category>
		<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[Data Warehouse]]></category>
		<category><![CDATA[ETL]]></category>
		<category><![CDATA[Merge]]></category>
		<category><![CDATA[SQLBits]]></category>
		<category><![CDATA[SSIS]]></category>

		<guid isPermaLink="false">http://www.purplefrogsystems.com/blog/?p=546</guid>
		<description><![CDATA[This is the 3rd post in the Frog-Blog series on the awesomeness of T-SQL Merge. Post 1: Introduction to T-SQL merge basics Post 2: Using T-SQL merge to load data warehouse dimensions In this post we&#8217;ll be looking at how we can automate the creation of the merge statement to reduce development time and improve [...]]]></description>
			<content:encoded><![CDATA[<p>This is the 3rd post in the Frog-Blog series on the awesomeness of T-SQL Merge.</p>
<ul>
<li>Post 1: <a href="/blog/2011/12/introduction-to-t-sql-merge-basics/">Introduction to T-SQL merge basics</a></li>
<li>Post 2: <a href="/blog/2012/01/using-t-sql-merge-to-load-data-warehouse-dimensions/">Using T-SQL merge to load data warehouse dimensions</a></li>
</ul>
<p>In this post we&#8217;ll be looking at how we can automate the creation of the merge statement to reduce development time and improve reliability and flexibility of the ETL process. I discussed this in the 2nd half of a talk I gave at the UK technical launch of SQL Server 2012 at <a href="http://www.SQLBits.com" target="_blank">SQLBits X</a>. Thank you to the great audience who came to that talk, this post is for your benefit and is a result of the feedback and requests from you guys.</p>
<h2>Why automate merge?</h2>
<p>As we saw in the <a href="/blog/2012/01/using-t-sql-merge-to-load-data-warehouse-dimensions/">previous post</a>, merge is an incredibly powerful tool when loading data into data warehouse dimensions (specifically SCDs &#8211; slowly changing dimensions). The whole process can be wrapped up into a very neat stored proc which can save a considerable mount of time writing the equivalent functionality in SSIS. In the next installment of this series I&#8217;ll be discussing the performance of it compared to other methods of loading SCDs in SSIS (take a look at the SQLBits talk video [when it's released] for a preview!). Suffice to say for now that in my [pretty comprehensive] tests it&#8217;s one of the fastest methods of loading SCDs.</p>
<p>If you missed the talk, you can <a href="/download/blog/LoadingSCDsInSSIS_SQLBitsX.pdf" target="_blank">download the slide deck here</a> whilst you&#8217;re waiting for the video.</p>
<p>The problem that stops a lot of people using merge is the perceived complexity of the statement. It can be very easy to get things wrong, with pretty bad consequences on your dimension data.</p>
<p>The easiest way to avoid this complexity and simplify the process is to not write merge statements, but let an automated procedure to it for you &#8211; Simples!.</p>
<p>The other huge benefit is that, as we&#8217;ll see during this post, you can base the automation procedure on metadata, meaning that you can change the SCD functionality of your dimensions just by changing metadata, and not rewriting your code.</p>
<p>Note that in this post we&#8217;ll just be looking at <a href="http://en.wikipedia.org/wiki/Slowly_changing_dimension" target="_blank">Type 0 and 1 SCDs, not 2, 3 or 6</a>. This is to keep things simple. Once you&#8217;ve mastered type 0 and 1, it&#8217;s a logical next step to expand things to deal with type 2s.</p>
<h2>OK, so how do we do this?</h2>
<p>First of all we need to set up two tables to use. Let&#8217;s create a simple Customer dimension. Alongside this we also need a staging table. I&#8217;m a big fan of using schemas to differentiate tables, so we&#8217;ll create dim.Customer and etl.Customer as our two tables.</p>
<pre class="code">CREATE SCHEMA [dim] AUTHORIZATION [dbo]
GO
CREATE SCHEMA [etl] AUTHORIZATION [dbo]
GO

CREATE TABLE [dim].[Customer](
    [CustomerKey]   [int] IDENTITY(1,1) NOT NULL,
    [Email]         [varchar](255)      NOT NULL,
    [FirstName]     [varchar](50)       NOT NULL,
    [LastName]      [varchar](50)       NOT NULL,
    [DoB]           [date]              NOT NULL,
    [Sex]           [char](1)           NOT NULL,
    [MaritalStatus] [varchar](10)       NOT NULL,
    [FirstCreated]  [date]              NOT NULL,
    [IsRowCurrent]  [bit]               NOT NULL,
    [ValidFrom]     [datetime]          NOT NULL,
    [ValidTo]       [datetime]          NOT NULL,
    [LastUpdated]   [datetime]          NOT NULL
 CONSTRAINT [PK_DimCustomer] PRIMARY KEY CLUSTERED
(
	[CustomerKey] ASC
))
GO

CREATE TABLE [etl].[Customer](
    [Email]         [varchar](255)  NOT NULL,
    [FirstName]     [varchar](50)   NOT NULL,
    [LastName]      [varchar](50)   NOT NULL,
    [DoB]           [date]          NOT NULL,
    [Sex]           [char](1)       NOT NULL,
    [MaritalStatus] [varchar](10)   NOT NULL,
    [FirstCreated]  [date]          NOT NULL
)</pre>
<p>So the dim table contains our primary surrogate key, business key (email address in this case), customer details and a series of audit fields (IsRowCurrent, ValidFrom, etc.). The etl staging table only contains the business key and customer details.</p>
<p>We then need to store the details of each field. i.e. how should each field be interpreted &#8211; is it a primary key, business, key, type 0 or 1, or an audit field. We need this so that we can put the correct fields into the correct place in the merge statement. You could create a table to store this information, however I prefer to use the extended properties of the fields.</p>
<pre class="code">EXEC sys.sp_addextendedproperty @level2name=N'CustomerKey',  @value=N'PK' ,
    @name=N'SCD', @level0type=N'SCHEMA',@level0name=N'Dim',
    @level1type=N'TABLE',@level1name=N'Customer', @level2type=N'COLUMN'
EXEC sys.sp_addextendedproperty @level2name=N'Email',        @value=N'BK' ,
    @name=N'SCD', @level0type=N'SCHEMA',@level0name=N'Dim',
    @level1type=N'TABLE',@level1name=N'Customer', @level2type=N'COLUMN'
EXEC sys.sp_addextendedproperty @level2name=N'FirstName',    @value=N'1' ,
    @name=N'SCD', @level0type=N'SCHEMA',@level0name=N'Dim',
    @level1type=N'TABLE',@level1name=N'Customer', @level2type=N'COLUMN'
EXEC sys.sp_addextendedproperty @level2name=N'LastName',     @value=N'1' ,
    @name=N'SCD', @level0type=N'SCHEMA',@level0name=N'Dim',
    @level1type=N'TABLE',@level1name=N'Customer', @level2type=N'COLUMN'
EXEC sys.sp_addextendedproperty @level2name=N'DoB',          @value=N'1' ,
    @name=N'SCD', @level0type=N'SCHEMA',@level0name=N'Dim',
    @level1type=N'TABLE',@level1name=N'Customer', @level2type=N'COLUMN'
EXEC sys.sp_addextendedproperty @level2name=N'Sex',          @value=N'1' ,
    @name=N'SCD', @level0type=N'SCHEMA',@level0name=N'Dim',
    @level1type=N'TABLE',@level1name=N'Customer', @level2type=N'COLUMN'
EXEC sys.sp_addextendedproperty @level2name=N'MaritalStatus',@value=N'1' ,
    @name=N'SCD', @level0type=N'SCHEMA',@level0name=N'Dim',
    @level1type=N'TABLE',@level1name=N'Customer', @level2type=N'COLUMN'
EXEC sys.sp_addextendedproperty @level2name=N'FirstCreated', @value=N'1' ,
    @name=N'SCD', @level0type=N'SCHEMA',@level0name=N'Dim',
    @level1type=N'TABLE',@level1name=N'Customer', @level2type=N'COLUMN'
EXEC sys.sp_addextendedproperty @level2name=N'ValidFrom',    @value=N'Audit' ,
    @name=N'SCD', @level0type=N'SCHEMA',@level0name=N'Dim',
    @level1type=N'TABLE',@level1name=N'Customer', @level2type=N'COLUMN'
EXEC sys.sp_addextendedproperty @level2name=N'ValidTo',      @value=N'Audit' ,
    @name=N'SCD', @level0type=N'SCHEMA',@level0name=N'Dim',
    @level1type=N'TABLE',@level1name=N'Customer', @level2type=N'COLUMN'
EXEC sys.sp_addextendedproperty @level2name=N'IsRowCurrent', @value=N'Audit' ,
    @name=N'SCD', @level0type=N'SCHEMA',@level0name=N'Dim',
    @level1type=N'TABLE',@level1name=N'Customer', @level2type=N'COLUMN'
EXEC sys.sp_addextendedproperty @level2name=N'LastUpdated',  @value=N'Audit' ,
    @name=N'SCD', @level0type=N'SCHEMA',@level0name=N'Dim',
    @level1type=N'TABLE',@level1name=N'Customer', @level2type=N'COLUMN'</pre>
<p>Or you can obviously just enter the extended property manually using SSMS</p>
<p><img class="aligncenter size-full wp-image-553" title="SCDExtendedProperty" src="http://www.purplefrogsystems.com/blog/wp-content/uploads/2012/04/SCDExtendedProperty.png" alt="" width="600" height="415" /></p>
<p>The SSIS package should output all customer records into the etl table, with no regard for whether they are new customers, old customers, changed or not. The merge statement will take care of that.</p>
<h2>Automating Merge</h2>
<p>The first stage is to examine the structure of merge.</p>
<pre>   MERGE   <span style="color: #ff0000;">[DIMENSION TABLE]</span>  as Target
   USING   <span style="color: #ff0000;">[STAGING TABLE]</span>    as Source
      ON   <span style="color: #ff0000;">[LIST OF BUSINESS KEY FIELDS]</span>
   WHEN MATCHED AND
         Target.<span style="color: #ff0000;">[LIST OF TYPE 1 FIELDS]</span> &lt;&gt; Source.<span style="color: #ff0000;">[LIST OF TYPE 1 FIELDS]</span>
      THEN UPDATE SET
         <span style="color: #ff0000;">[LIST OF TYPE 1 FIELDS]</span> = Source.<span style="color: #ff0000;">[LIST OF TYPE 1 FIELDS]</span>
   WHEN NOT MATCHED THEN INSERT
         <span style="color: #ff0000;">[LIST OF ALL FIELDS]</span>
      VALUES
         Source.<span style="color: #ff0000;">[LIST OF ALL FIELDS]</span>
</pre>
<p>The text in black is the skeleton of the statement, with the text in red being the details specific to the dimension. It&#8217;s these red items which we need to retrieve from the metadata of the dimension in order to create the full merge statement.</p>
<p>We can retrieve the extended properties using the sys.extended_properties DMV. This allows us to pull out a list of all fields which have a specific extended property set, e.g. all PK fields, all BK fields, all type 2 fields etc. etc. If we then put a few of these queries into cursors, we can loop through them and build up a dynamic SQL query. Yes I know, dynamic SQL should be avoided and is evil etc., however&#8230; this use is an exception and does truly make the World a better place.</p>
<p>I&#8217;m not going to explain the resulting proc in minute detail, so instead please just <a href="/download/blog/GenerateMerge.sql" target="_blank">download it here</a> and work through it yourself. I will however explain a couple of items which are pretty important:</p>
<p>It&#8217;s important to keep the naming convention of your dimensions consistent. This doesn&#8217;t mean that every dimension must be identical, some may need inferred member support, some may need type 2 tracking fields (e.g. IsRowCurrent) and some may not; the critical thing is that all of your fields, if they do exist, should be named consistently. The automation proc can then look for specific field names and include them in the merge statement if necessary.</p>
<p>There is a parameter in the proc called @Execute. This offers the possibility of either executing the resulting merge statement directly, or just printing out the statement. If you only want to use this to automate the development process then this allows you to do just that, you can then just copy and paste the resulting statement into SSIS or into a stored proc.</p>
<h2>Result</h2>
<p>The automated generation of T-SQL merge statement to handle type 0 &#038; 1 SCDs!<br />
Hopefully you can see how you can expand this to also cope with Type 2 SCDs, following the structure in my earlier posts.</p>
<p><a href="/download/blog/GenerateMerge.sql" target="_blank">Download the SQL scripts here</a><br />
&nbsp;<br />
Frog-Blog Out</p>
]]></content:encoded>
			<wfw:commentRss>http://www.purplefrogsystems.com/blog/2012/04/automating-t-sql-merge-to-load-dimensions-scd/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Pattern matching in SSIS using Regular Expressions and the Script component</title>
		<link>http://www.purplefrogsystems.com/blog/2011/07/pattern-matching-in-ssis-using-regular-expressions-and-the-script-component/</link>
		<comments>http://www.purplefrogsystems.com/blog/2011/07/pattern-matching-in-ssis-using-regular-expressions-and-the-script-component/#comments</comments>
		<pubDate>Fri, 08 Jul 2011 15:28:01 +0000</pubDate>
		<dc:creator>Alex</dc:creator>
				<category><![CDATA[.Net]]></category>
		<category><![CDATA[Integration Services]]></category>
		<category><![CDATA[ETL]]></category>
		<category><![CDATA[Pattern Matching]]></category>
		<category><![CDATA[RegEx]]></category>
		<category><![CDATA[Regular Expression]]></category>
		<category><![CDATA[SSIS]]></category>

		<guid isPermaLink="false">http://www.purplefrogsystems.com/blog/?p=413</guid>
		<description><![CDATA[One of my favourite features of SSIS is the script component, and I know I’m not alone. Why? Because it brings the entire might of the .NET framework to SSIS, providing C# (in SQL 2008 onwards) and VB.NET extensibility for all those times where SSIS doesn’t quite have enough functionality out of the box. Such [...]]]></description>
			<content:encoded><![CDATA[<p>One of my favourite features of SSIS is the script component, and I know I’m not alone. Why? Because it brings the entire might of the .NET framework to SSIS, providing C# (in SQL 2008 onwards) and VB.NET extensibility for all those times where SSIS doesn’t quite have enough functionality out of the box.</p>
<h2>Such as?</h2>
<p>Well a problem I’ve come across a number of times is string parsing. Trying to search for and extract a specific pattern of characters from a larger text string. I’ve seen developers build crazy convoluted expressions in the derived column transform, some of which are very impressive in their complexity! This is a bad thing not good, although it shows a level of prowess in building SSIS expressions (not the most intuitive expression language!), it becomes a support nightmare for other developers.</p>
<p>Take for example extracting a house number from the first line of an address, we want to convert “29 Acacia Road” into “29”</p>
<p>Or extracting a version number from a product, converting “iPhone 4G, IOS 4.3.2, black” into “4.3.2”</p>
<p>Or extract the html page from a URL, converting “http://www.wibble.com/folder/page.aspx” into “page.aspx”</p>
<h2>Regular Expressions</h2>
<p>The .NET framework includes full support for regular expressions (Regex), in the System.Text.RegularExpressions namespace. Regex provide an incredibly powerful way of defining and finding string patterns. They take some getting used to, but once you get the hang of them you can unleash their power in your SSIS dataflow pipelines.</p>
<p>To find out more about regular expressions, look at the following links</p>
<ul>
<li><a href="http://en.wikipedia.org/wiki/Regular_expression" target="_Blank">Regex explanation</a></li>
<li><a href="http://www.addedbytes.com/cheat-sheets/regular-expressions-cheat-sheet/" target="_Blank">Regex cheat sheet</a></li>
<li><a href="http://www.regular-expressions.info/javascriptexample.html" target="_blank">Regex tester</a></li>
</ul>
<p> </p>
<h2>Let’s look at an example</h2>
<p>Let’s take our first example from above, extracting a house number, converting “29 Acacia Road” into “29”.</p>
<p>The first thing we need to do is define our Regex search pattern. In this case we know that it must be at the start of the string, and must be an integer, with any number of characters 0-9.</p>
<p>The pattern for this is “^[0-9]+”, which is broken down as<br />
    ^ means the start of the line<br />
    [0-9] means any number<br />
    + means 1 or more of the preceding item.<br />
    i.e. 1 or more integers at the start of the line.</p>
<p>What if we wanted this to also cope with a single letter after the number? i.e. “221b Baker Street”</p>
<p>We can add “[A-Za-z]?” to our pattern, in which<br />
    [A-Za-z] means any character A-Z in either upper or lower case<br />
    ? means 0 or 1 occurrences of this</p>
<p>We should also add “\b” to the end of this, which indicates a word boundary. This means that 221b should be a whole word, not part of a larger sequence “221BakerSt”. We can wrap up the [A-Za-z]\b together into brackets so that the ? applies to the combination, so that any single letter must be the end of a word or it will be ignored. In this way “221BakerSt” will return 221, as will “221 Baker St”, whereas “221B Baker St” will return “221B”.</p>
<p>So our new pattern is “^[0-9]+([A-Za-z]\b)?”</p>
<p>You’ve probably gathered by now that regular expressions can get quite complicated. I’m not going to go into any more detail about them here, but hopefully this gives you some idea of what they can do. There’s plenty of reading on the web if you want to know more. You should also make use of the Regex expression tester in the link above &#8211; it will save you lots of debugging!</p>
<h2>How do we use Regular Expressions in SSIS?</h2>
<p>Well it turns out this is the easy bit, with the help of the script component.</p>
<p><strong>Step 1</strong> – Add a script component into your data flow pipeline, configure it as a data transform. I’m using C#, but you can use VB.NET if you want</p>
<p><strong>Step 2</strong> – Give the script access to the input and output columns</p>
<p>Open the script component and select the input field from the “Input Columns” screen, in this case “Address1”. This can be ReadOnly.<br />
<img src="http://www.purplefrogsystems.com/blog/wp-content/uploads/2011/07/RegExSSIS1.png" alt="" title="RegExSSIS1" width="650" height="636" class="aligncenter size-full wp-image-419" /><br />
Go to the “Inputs and Outputs” screen and add an output column to “Output 0”. We want to set the datatype to string, length 10. This new field will contain the results of our Regex pattern matching.<br />
<img src="http://www.purplefrogsystems.com/blog/wp-content/uploads/2011/07/RegExSSIS2.png" alt="" title="RegExSSIS2" width="650" height="636" class="aligncenter size-full wp-image-421" /><br />
<strong>Step 3</strong> – Create the script code</p>
<p>Click on “Edit Script” on the Script screen which will open up Visual Studio.</p>
<p>Add a reference to System.Text.RegularExpressions at the top of the script</p>
<pre><span style="color: #008000;">
      using System.Text.RegularExpressions;
</span></pre>
<p>Then place the necessary code in the Input0_ProcessInputRow function.</p>
<pre><span style="color: #008000;">
    public override void Input0_ProcessInputRow(Input0Buffer Row)
    {
        //Replace each \ with \\ so that C# doesn't treat \ as escape character
        //Pattern: Start of string, any integers, 0 or 1 letter, end of word
        string sPattern = "^[0-9]+([A-Za-z]\\b)?";
        string sString = Row.Address1 ?? ""; //Coalesce to empty string if NULL

        //Find any matches of the pattern in the string
        Match match = Regex.Match(sString, sPattern, RegexOptions.IgnoreCase);
        //If a match is found
        if (match.Success)
            //Return the first match into the new
            //HouseNumber field
            Row.HouseNumber = match.Groups[0].Value;
        else
            //If not found, leave the HouseNumber blank
            Row.HouseNumber = "";
    }
</span></pre>
<p><br/><br />
<img src="http://www.purplefrogsystems.com/blog/wp-content/uploads/2011/07/RegExSSIS3.png" alt="" title="RegExSSIS3" width="709" height="688" class="aligncenter size-full wp-image-422" /><br />
When you save and exit the script, any component downstream of the script component will have access to the new HouseNumber field.<br />
<img src="http://www.purplefrogsystems.com/blog/wp-content/uploads/2011/07/RegExSSIS41.png" alt="" title="RegExSSIS4" width="561" height="318" class="aligncenter size-full wp-image-425" /><br />
<strong>Flog-Blog-Out</strong></p>
]]></content:encoded>
			<wfw:commentRss>http://www.purplefrogsystems.com/blog/2011/07/pattern-matching-in-ssis-using-regular-expressions-and-the-script-component/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Speed up SSIS by using a slower query</title>
		<link>http://www.purplefrogsystems.com/blog/2011/02/speed-up-ssis-by-using-a-slower-query/</link>
		<comments>http://www.purplefrogsystems.com/blog/2011/02/speed-up-ssis-by-using-a-slower-query/#comments</comments>
		<pubDate>Thu, 17 Feb 2011 18:00:56 +0000</pubDate>
		<dc:creator>Alex</dc:creator>
				<category><![CDATA[Integration Services]]></category>
		<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[SSIS]]></category>

		<guid isPermaLink="false">http://www.purplefrogsystems.com/blog/?p=326</guid>
		<description><![CDATA[This isn&#8217;t a technical blog post of my own, but a shout out to Rob Farley and an excellent blog post explaining how to use SQL&#8217;s OPTION (FAST x) hint. He explains how you can speed up an SSIS data flow by slowing down the source query. It may seem illogical at first, but you&#8217;ll [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://sqlblog.com/blogs/rob_farley/archive/2011/02/17/the-ssis-tuning-tip-that-everyone-misses.aspx" target="_blank"><img class="alignright size-full wp-image-327" title="SQLBlogCom" src="http://www.purplefrogsystems.com/blog/wp-content/uploads/2011/02/SQLBlogCom.gif" alt="" width="115" height="80" /></a>This isn&#8217;t a technical blog post of my own, but a shout out to Rob Farley and an excellent blog post explaining how to use SQL&#8217;s OPTION (FAST x) hint. He explains how you can speed up an SSIS data flow by slowing down the source query. It may seem illogical at first, but you&#8217;ll understand after you go and read Rob&#8217;s post!</p>
<p>Read Rob&#8217;s post here: <a href="http://sqlblog.com/blogs/rob_farley/archive/2011/02/17/the-ssis-tuning-tip-that-everyone-misses.aspx" target="_blank">Speeding up SSIS using OPTION (FAST)</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.purplefrogsystems.com/blog/2011/02/speed-up-ssis-by-using-a-slower-query/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Forecasting the Performance of SSIS packages</title>
		<link>http://www.purplefrogsystems.com/blog/2010/10/forecasting-the-performance-of-ssis-packages/</link>
		<comments>http://www.purplefrogsystems.com/blog/2010/10/forecasting-the-performance-of-ssis-packages/#comments</comments>
		<pubDate>Fri, 29 Oct 2010 12:52:39 +0000</pubDate>
		<dc:creator>Alex</dc:creator>
				<category><![CDATA[Integration Services]]></category>
		<category><![CDATA[ETL]]></category>
		<category><![CDATA[Package]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[Projections]]></category>
		<category><![CDATA[SSIS]]></category>

		<guid isPermaLink="false">http://www.purplefrogsystems.com/blog/?p=272</guid>
		<description><![CDATA[SQL Server Integration Services (SSIS) packages are used in numerous scenarios for moving data from A to B. Often they are developed and tested against a cutdown, often static, subset of data. One of the problems with this is that yes you’re testing the functionality of the package as it’s being developed, but there’s no [...]]]></description>
			<content:encoded><![CDATA[<p><img src="/images/blog/ssisperformanceprojections0.png" alt="" align="right" />SQL Server Integration Services (SSIS) packages are used in numerous scenarios for moving data from A to B. Often they are developed and tested against a cutdown, often static, subset of data. One of the problems with this is that yes you’re testing the functionality of the package as it’s being developed, but there’s no way to determine whether the performance will scale up to a full size production environment. This level of testing is more often than not ignored, resulting in packages being deployed to live which just can’t cope with the data volume, bringing down the load process.</p>
<p>We can divide performance checking into two:</p>
<ol>
<li>Load testing pre deployment</li>
<li>Continual monitoring and projections</li>
</ol>
<p>It’s vital to undertake performance load testing of packages before they’re deployed, or at least review the source queries and SSIS components and structure to ensure there’s nothing that’s likely to cause an exponentially increasing runtime. There are loads of blog posts about SSIS performance tuning so I won’t go into that here.</p>
<p>What I did want to talk about here was the importance of continual monitoring. A package that runs fine today may grind to a halt in a year’s time if the live data volume continues to increase. How do you check this, and how do you project data growth into the future to predict performance problems that haven’t happened yet?</p>
<p>The first step is to start tracking the time taken to run each package, and store this to a table. As a rule I always build this level of logging into my template packages when I’m defining the SSIS ETL framework. Again, there are heaps of articles on different ways to do this, check out one of <a href="http://blogs.technet.com/b/industry_insiders/archive/2005/06/30/407125.aspx" target="_blank">Jamie&#8217;s gems</a> as a starting point. The key outcome is that you end up with a start time and end time (and hence a duration) of each package every time it runs. If you don’t have any custom logging, you can always hack together  the data from the sysssislog table if you’ve enabled it (and I hope you have..!).</p>
<p>Once you have the raw data available, leave the package to run for a month or two and then analyse the results in Excel to perform a simple projection. Just copy the data into Excel in a format similar to this. It doesn’t matter if you have duplicate dates</p>
<table align="center">
<tbody>
<tr>
<td width="150px">Date</td>
<td width="150px">Duration</td>
</tr>
<tr>
<td>18/08/2010</td>
<td>17</td>
</tr>
<tr>
<td>18/08/2010</td>
<td>16</td>
</tr>
<tr>
<td>19/08/2010</td>
<td>17</td>
</tr>
<tr>
<td>20/08/2010</td>
<td>18</td>
</tr>
<tr>
<td>21/08/2010</td>
<td>17</td>
</tr>
<tr>
<td colspan="2">…</td>
</tr>
</tbody>
</table>
<p>And then create a scatter chart</p>
<p style="text-align: center;"><img class="aligncenter" src="/images/blog/ssisperformanceprojections1.png" alt="" align="center" /></p>
<p>Format the X axis and make sure it’s set to be a date. You should end up with a chart similar to this.</p>
<p style="text-align: center;"><img class="aligncenter" src="/images/blog/ssisperformanceprojections2.png" alt="" align="center" /></p>
<p>Add a trend line to the chart by right clicking on one of the data points and click ‘add trendline’. Hopefully the trendline will be linear so choose that. If your data looks exponential then you really need to re-assess your package urgently!</p>
<p style="text-align: center;"><img class="aligncenter" src="/images/blog/ssisperformanceprojections3.png" alt="" align="center" /></p>
<p>There’s a nifty feature of Excel trendlines that allows you to forecast the trendline forward by x periods. If you set this to 365 it will project the package duration forward for a year. The reliability of this trendline will increase as the volume of sample data increases. i.e. if you run your packages for 3 months, you’ll be able to make better predictions than if you only run them for 2 weeks.</p>
<p style="text-align: center;"><img class="aligncenter" src="/images/blog/ssisperformanceprojections4.png" alt="" align="center" /></p>
<p>This clearly shows that although the package is currently taking 24 minutes to run, with the current data growth it will be taking approximately an hour in a year’s time.</p>
<p>When you do this for each package, you can quickly build up a picture of when you’re likely to run into trouble, and use this as justification for development resource to prevent the problems before they happen.</p>
<p>Frog-Blog Out</p>
]]></content:encoded>
			<wfw:commentRss>http://www.purplefrogsystems.com/blog/2010/10/forecasting-the-performance-of-ssis-packages/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SQL Server 2008 R2 &#8211; PowerPivot and Master Data Services</title>
		<link>http://www.purplefrogsystems.com/blog/2010/05/sql-server-2008-r2-powerpivot-and-master-data-services/</link>
		<comments>http://www.purplefrogsystems.com/blog/2010/05/sql-server-2008-r2-powerpivot-and-master-data-services/#comments</comments>
		<pubDate>Tue, 18 May 2010 20:13:48 +0000</pubDate>
		<dc:creator>Alex</dc:creator>
				<category><![CDATA[Analysis Services]]></category>
		<category><![CDATA[Integration Services]]></category>
		<category><![CDATA[Reporting Services]]></category>
		<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[DAX]]></category>
		<category><![CDATA[Master Data Services]]></category>
		<category><![CDATA[PowerPivot]]></category>

		<guid isPermaLink="false">http://www.purplefrogsystems.com/blog/?p=114</guid>
		<description><![CDATA[Purple Frog spent a very interesting day at Microsoft last week, at one of their many events promoting the launch of SQL Server 2008 R2. Rafal Lukewiecki presented an entertaining (as always!) and informative series of talks covering the release, focusing on the enhanced Business Intelligence tools available. The primary changes to note are Power [...]]]></description>
			<content:encoded><![CDATA[<p>Purple Frog spent a very interesting day at Microsoft last week, at one of their many events promoting the launch of SQL Server 2008 R2. Rafal Lukewiecki presented an entertaining (as always!) and informative series of talks covering the release, focusing on the enhanced Business Intelligence tools available.</p>
<p>The primary changes to note are</p>
<ul>
<li><strong>Power Pivot</strong> – An in memory, client side add-in to Excel, that allows users to create virtual cubes on their desktop and analyse over 100m records of data virtually instantly</li>
<li><strong>DAX </strong>– A new expression language, designed for non-technical (probably more realistically, semi-technical) users to extend pivot tables and power pivot tables without having to learn MDX</li>
<li><strong>Report Components</strong> – In a report consisting of a couple of tables, a chart and a few gauges (gauges, sparklines &amp; maps are all new features of SSRS), you can save each element as a component and re-use it in different reports. This should result in much less duplication of work.</li>
<li><strong>Report Builder 3</strong> – A thin-client tool allowing end users to create Reporting Services reports. This is a big enhancement over its predecessor s it is finally fully compatible with reports created in the Business Intelligence Development Studio (BIDS), including report components.</li>
<li><strong>Master Data Services</strong> – A centralised tool and database intended to provide governance of your organisation’s master data (centralised list of products, fiscal calendar, regions etc.).</li>
</ul>
<p>The enhancements to <strong>Reporting Services</strong> (SSRS) are very welcome, and should be of huge benefit to anyone either currently using SSRS or considering using it. I firmly believe that there are no comparable web based reporting engines that even come close for SME organisations when looking at the whole picture including cost of implementation, ease of use, flexibility and capability.</p>
<p><strong>Master Data Services</strong> as a concept has been around for a long time, but there has never been a tool available to organisations to effectively implement it. This is Microsoft’s first proper stab at delivering a workable solution, and although I’m a big fan of the concept, and have no doubt of its benefit to a SME, I’m yet to be convinced that the tool is ready for a large scale corporate environment. Time will tell how scalable and manageable the system is, and credit has to go to Microsoft for starting the ball rolling.</p>
<p>The most impressive addition is without a doubt <strong>PowerPivot</strong>. In a nutshell, it’s a user defined OLAP cube wrapped up within Excel 2010, running entirely in memory on a user’s workstation. If you’ve not yet played with it or seen a demo, I’ll try and elaborate for you… Think about loading Excel with 1 million rows, and then imagine sorting and filtering a number of those columns [cue going out to lunch whilst waiting for Excel to catch up]. With PowerPivot, you can sort and filter over 100 million rows of data almost in an instant – it’s very impressive indeed!</p>
<p>That’s the snazzy demo bit, but to reduce it to a glorified spreadsheet is very harsh indeed. It allows a user to import multiple data sources and combine them together into a single dimensional data model, PowerPivot will create your own personal cube, without you having to build a warehouse, without knowing anything about MDX, dimension hierarchies, attribute relationships, granularity etc. etc.</p>
<p>Microsoft’s vision and reason for creating this tool is self-service BI, allowing users to create their own cubes, data analysis environments and reporting systems. And this is where I start to have a problem…</p>
<p>I can’t remember the last time I designed a data warehouse, where I did not find significant data quality problems, conflicting data, missing data, duplicated data etc.. I also find it hard to think of a situation where an end user (even a power user) is sufficiently clued up about the intricacies of a source OLTP database to be able to extract the right data and know what to do with it. Or if they are, a dozen other people in different departments have a different idea about how things work, resulting in many different versions of the truth.</p>
<p>I’m therefore (for now!) sticking with the opinion that it is still absolutely vital for an organisation to provide a clean, consistent, dimensionally modelled data warehouse as the basis for their BI/MI infrastructure. Tools like PowerPivot then sit very nicely on top to provide an incredibly powerful and beneficial user experience, but to try and use the emergence of self-service BI tools to usher in a new ‘non-data warehouse’ era is a very dangerous route which I hope people will avoid.</p>
<p>In summary – this release brings with it a fantastic host of new tools, but with great power comes great responsibility…</p>
]]></content:encoded>
			<wfw:commentRss>http://www.purplefrogsystems.com/blog/2010/05/sql-server-2008-r2-powerpivot-and-master-data-services/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

