One of my favourite features of SSIS is the script component, and I know I’m not alone. Why? Because it brings the entire might of the .NET framework to SSIS, providing C# (in SQL 2008 onwards) and VB.NET extensibility for all those times where SSIS doesn’t quite have enough functionality out of the box.
Such as?
Well a problem I’ve come across a number of times is string parsing. Trying to search for and extract a specific pattern of characters from a larger text string. I’ve seen developers build crazy convoluted expressions in the derived column transform, some of which are very impressive in their complexity! This is a bad thing not good, although it shows a level of prowess in building SSIS expressions (not the most intuitive expression language!), it becomes a support nightmare for other developers.
Take for example extracting a house number from the first line of an address, we want to convert “29 Acacia Road” into “29”
Or extracting a version number from a product, converting “iPhone 4G, IOS 4.3.2, black” into “4.3.2”
Or extract the html page from a URL, converting “http://www.wibble.com/folder/page.aspx” into “page.aspx”
Regular Expressions
The .NET framework includes full support for regular expressions (Regex), in the System.Text.RegularExpressions namespace. Regex provide an incredibly powerful way of defining and finding string patterns. They take some getting used to, but once you get the hang of them you can unleash their power in your SSIS dataflow pipelines.
To find out more about regular expressions, look at the following links
Let’s look at an example
Let’s take our first example from above, extracting a house number, converting “29 Acacia Road” into “29”.
The first thing we need to do is define our Regex search pattern. In this case we know that it must be at the start of the string, and must be an integer, with any number of characters 0-9.
The pattern for this is “^[0-9]+”, which is broken down as
^ means the start of the line
[0-9] means any number
+ means 1 or more of the preceding item.
i.e. 1 or more integers at the start of the line.
What if we wanted this to also cope with a single letter after the number? i.e. “221b Baker Street”
We can add “[A-Za-z]?” to our pattern, in which
[A-Za-z] means any character A-Z in either upper or lower case
? means 0 or 1 occurrences of this
We should also add “\b” to the end of this, which indicates a word boundary. This means that 221b should be a whole word, not part of a larger sequence “221BakerSt”. We can wrap up the [A-Za-z]\b together into brackets so that the ? applies to the combination, so that any single letter must be the end of a word or it will be ignored. In this way “221BakerSt” will return 221, as will “221 Baker St”, whereas “221B Baker St” will return “221B”.
So our new pattern is “^[0-9]+([A-Za-z]\b)?”
You’ve probably gathered by now that regular expressions can get quite complicated. I’m not going to go into any more detail about them here, but hopefully this gives you some idea of what they can do. There’s plenty of reading on the web if you want to know more. You should also make use of the Regex expression tester in the link above – it will save you lots of debugging!
How do we use Regular Expressions in SSIS?
Well it turns out this is the easy bit, with the help of the script component.
Step 1 – Add a script component into your data flow pipeline, configure it as a data transform. I’m using C#, but you can use VB.NET if you want
Step 2 – Give the script access to the input and output columns
Open the script component and select the input field from the “Input Columns” screen, in this case “Address1”. This can be ReadOnly.

Go to the “Inputs and Outputs” screen and add an output column to “Output 0”. We want to set the datatype to string, length 10. This new field will contain the results of our Regex pattern matching.

Step 3 – Create the script code
Click on “Edit Script” on the Script screen which will open up Visual Studio.
Add a reference to System.Text.RegularExpressions at the top of the script
using System.Text.RegularExpressions;
Then place the necessary code in the Input0_ProcessInputRow function.
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
//Replace each \ with \\ so that C# doesn't treat \ as escape character
//Pattern: Start of string, any integers, 0 or 1 letter, end of word
string sPattern = "^[0-9]+([A-Za-z]\\b)?";
string sString = Row.Address1 ?? ""; //Coalesce to empty string if NULL
//Find any matches of the pattern in the string
Match match = Regex.Match(sString, sPattern, RegexOptions.IgnoreCase);
//If a match is found
if (match.Success)
//Return the first match into the new
//HouseNumber field
Row.HouseNumber = match.Groups[0].Value;
else
//If not found, leave the HouseNumber blank
Row.HouseNumber = "";
}

When you save and exit the script, any component downstream of the script component will have access to the new HouseNumber field.

Flog-Blog-Out

I specialise in designing and implementing SQL Server business intelligence solutions,
and this is my blog! Just a collection of thoughts, techniques and ramblings on SQL Server, Cubes, Data Warehouses, MDX, DAX and whatever else comes to mind.

This is fantastic. Thank You Sooooo much for this information. It’s very powerful and works really well
One question: How would you add anohter output Column “StreetName” using the same “Address1″ in your C## code?
Let assume that you want to ouptut the following columns.
“HouseNumber”
“StreetName”
Where and how would you put that in your code?
Thank you and sorry for my ignorance about C##
F
Hi Fran
That’s a great question thanks.
You can output as many fields as you like from the code, just by adding more “Row.xxx = yyy;” lines.
The first thing you need to do is add a second output column (as per the second screenshot in the post), for street you’d probably want a string 50 or similar.
Then, in the code, just add another line at the end setting the new field to the value you want by using Row.StreetName = xxx;
I’ve modified the code slightly to do this here…
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
//Replace each \ with \\ so that C# doesn’t treat \ as escape character
//Pattern: Start of string, any integers, 0 or 1 letter, end of word
string sPattern = “^[0-9]+([A-Za-z]\\b)?”;
string sString = Row.Address1 ?? “”; //Coalesce to empty string if NULL
string sHouseNumber = “”;
//Find any matches of the pattern in the string
Match match = Regex.Match(sString, sPattern, RegexOptions.IgnoreCase);
//If a match is found
if (match.Success)
//Return the first match into the new
//HouseNumber field
sHouseNumber = match.Groups[0].Value;
Row.HouseNumber = sHouseNumber;
Row.StreetName = sString.Replace(sHouseNumber, “”).Trim();
}
Good luck with it
Alex