0845 643 64 63


Regular Expression to get tables from SQL query

If you’ve not come across regular expressions (RegEx) before then you’re missing out, they’re an incredibly powerful tool for matching, extracting or validating patterns of text.

I had a use case this week where I needed to take a SQL query, and quickly identify every table or view that was used in the query.

RegEx to the rescue!

(?<=((FROM[ \n\r]+)|(JOIN[ \n\r]+)|(APPLY[ \n\r]+)))((.?([a-zA-Z0-9_]+|([[a-zA-Z0-9._[]\s\$(){}+\?\<>|!]+])|("[a-zA-Z0-9._[]\s\$(){}+\?\<>|!]+")))+)

Let me break this down into chunks:

  1. We want any text that occurs after any of the following: FROM, JOIN, APPLY. We can use a ‘lookbehind’ assertion for this, and each of these words can be followed by any number of spaces, carriage returns or line feeds.
(?<=((FROM[ \n\r]+)|(JOIN[ \n\r]+)|(APPLY[ \n\r]+)))
  1. Then we want to have any number of repetitions of [an optional full stop, followed by] an entity name (entity being database, schema, table, view). The entity name being any combination of lower case, upper case, digits or underscore
  1. We then extend this to say if the entity is surrounded by [ ] or ” ” then other reserved characters are allowed as well, most of these have to be escaped in RegEx using \



We can then put them all together to give us a list of tables. To test this I love Derek Slager’s awesome online RegEx test utility – makes testing any RegEx simple and quick.

A big shout out to @RolandBouman, @MobileCK and @TheStephLocke for pointing out some gaps and optimisations.

That’s all for now, </FrogBlog Out>

BIML – What is it?

I’ve noticed a growing trend over the last year – the ever growing presence of BIML (Business Intelligence Markup Language). So what is it? What does it do? And do you need to learn it?

What is BIML?

Simply, it’s a way of defining the functionality of an SSIS (Integration Services) package. If you’ve ever opened an SSIS .dtsx file in notepad you’ll see a daunting mess of GUIDs that you really don’t want to play around with. BIML is a simple XML format that allows you to write an SSIS package in notepad. When you run a BIML script it creates SSIS packages for you. These can then be opened and edited in BIDS exactly the same as an SSIS package that you’d created manually.

To show the difference, first of all this is a sample BIML script:


Then, when this is compiled into an SSIS package it looks like this in the front end:

BIML Resulting Package

But this when you open the .dtsx package in notepad:


The BIML script is a little easier to digest!

But Why?

But why on earth would you want to do that, when you can just use the BIDS/Visual Studio GUI? The answer is C# and automation. You can mix C# code in with the BIML XML (in a similar way to PHP or old school ASP scripts). This allows you to have a single BIML script, which can apply itself to every item in a list, or every table in a database, and automatically generate all of your SSIS packages from a single template.

Yes, this is very cool stuff.

The following screenshot is the same script as above, but configured to loop through every table in the ‘dim’ schema of a data warehouse, creating a package that truncates the relevant dim table.

The C# script is highlighted in yellow for clarity.

With this, just running the script will create multiple SSIS packages at the click of a button.

How do you create and run a script?

Firstly you need BIDS Helper. But you should have that anyway.

Create a new Integration Services project, then right click on the project and click ‘Add New Biml File’


This will add a BIML script file into the Miscellaneous folder of the project.

Once you’ve written a script you can test it (right click on the script and select ‘Check Biml for Errors’, or you can run the script, generating the SSIS packages, by clicking ‘Generate SSIS Packages’.

So, do you need to learn BIML?

I have no doubt that BIML is the future of SSIS. Once you see the full power of if then you’ll never want to go back to manually coding packages again.

If you’re an SSIS pro then there’s a good chance that your next job will require BIML. Or if a potential employer doesn’t ask for it, you can certainly improve your chances of getting the job by selling it (and your skills) to them.

At Purple Frog, all of our SSIS development is now 90% automated using BIML, leaving us more time to focus on the 10% of work that need some custom tweaking or more enhanced logic.

What if you don’t like coding?

Well in that case, check out MIST from Varigence. It’s a GUI for BIML, and a lot more besides. If you’re going to be using BIML a lot then it may well be worth the investment.

<Frog-Blog Out>

Pattern matching in SSIS using Regular Expressions and the Script component

One of my favourite features of SSIS is the script component, and I know I’m not alone. Why? Because it brings the entire might of the .NET framework to SSIS, providing C# (in SQL 2008 onwards) and VB.NET extensibility for all those times where SSIS doesn’t quite have enough functionality out of the box.

Such as?

Well a problem I’ve come across a number of times is string parsing. Trying to search for and extract a specific pattern of characters from a larger text string. I’ve seen developers build crazy convoluted expressions in the derived column transform, some of which are very impressive in their complexity! This is a bad thing not good, although it shows a level of prowess in building SSIS expressions (not the most intuitive expression language!), it becomes a support nightmare for other developers.

Take for example extracting a house number from the first line of an address, we want to convert “29 Acacia Road” into “29”

Or extracting a version number from a product, converting “iPhone 4G, IOS 4.3.2, black” into “4.3.2”

Or extract the html page from a URL, converting “http://www.wibble.com/folder/page.aspx” into “page.aspx”

Regular Expressions

The .NET framework includes full support for regular expressions (Regex), in the System.Text.RegularExpressions namespace. Regex provide an incredibly powerful way of defining and finding string patterns. They take some getting used to, but once you get the hang of them you can unleash their power in your SSIS dataflow pipelines.

To find out more about regular expressions, look at the following links


Let’s look at an example

Let’s take our first example from above, extracting a house number, converting “29 Acacia Road” into “29”.

The first thing we need to do is define our Regex search pattern. In this case we know that it must be at the start of the string, and must be an integer, with any number of characters 0-9.

The pattern for this is “^[0-9]+”, which is broken down as
    ^ means the start of the line
    [0-9] means any number
    + means 1 or more of the preceding item.
    i.e. 1 or more integers at the start of the line.

What if we wanted this to also cope with a single letter after the number? i.e. “221b Baker Street”

We can add “[A-Za-z]?” to our pattern, in which
    [A-Za-z] means any character A-Z in either upper or lower case
    ? means 0 or 1 occurrences of this

We should also add “\b” to the end of this, which indicates a word boundary. This means that 221b should be a whole word, not part of a larger sequence “221BakerSt”. We can wrap up the [A-Za-z]\b together into brackets so that the ? applies to the combination, so that any single letter must be the end of a word or it will be ignored. In this way “221BakerSt” will return 221, as will “221 Baker St”, whereas “221B Baker St” will return “221B”.

So our new pattern is “^[0-9]+([A-Za-z]\b)?”

You’ve probably gathered by now that regular expressions can get quite complicated. I’m not going to go into any more detail about them here, but hopefully this gives you some idea of what they can do. There’s plenty of reading on the web if you want to know more. You should also make use of the Regex expression tester in the link above – it will save you lots of debugging!

How do we use Regular Expressions in SSIS?

Well it turns out this is the easy bit, with the help of the script component.

Step 1 – Add a script component into your data flow pipeline, configure it as a data transform. I’m using C#, but you can use VB.NET if you want

Step 2 – Give the script access to the input and output columns

Open the script component and select the input field from the “Input Columns” screen, in this case “Address1”. This can be ReadOnly.

Go to the “Inputs and Outputs” screen and add an output column to “Output 0”. We want to set the datatype to string, length 10. This new field will contain the results of our Regex pattern matching.

Step 3 – Create the script code

Click on “Edit Script” on the Script screen which will open up Visual Studio.

Add a reference to System.Text.RegularExpressions at the top of the script

      using System.Text.RegularExpressions;

Then place the necessary code in the Input0_ProcessInputRow function.

    public override void Input0_ProcessInputRow(Input0Buffer Row)
        //Replace each \ with \\ so that C# doesn't treat \ as escape character
        //Pattern: Start of string, any integers, 0 or 1 letter, end of word
        string sPattern = "^[0-9]+([A-Za-z]\\b)?";
        string sString = Row.Address1 ?? ""; //Coalesce to empty string if NULL
        //Find any matches of the pattern in the string
        Match match = Regex.Match(sString, sPattern, RegexOptions.IgnoreCase);
        //If a match is found
        if (match.Success)
            //Return the first match into the new
            //HouseNumber field
            Row.HouseNumber = match.Groups[0].Value;
            //If not found, leave the HouseNumber blank
            Row.HouseNumber = "";

When you save and exit the script, any component downstream of the script component will have access to the new HouseNumber field.


Clearing SSRS Query cache

When developing SQL Server Reporting Services (SSRS) reports, BIDS caches the query results when you preview the report. This cache is then used next time you run a preview. This has the benefit of speeding up report development, but it does cause a problem when you want to test changing data.

A simple way of forcing the cache to refresh is to open the folder containing the .rdl report files, and delete the corresponding .rdl.data files. The next time you preview the report SSRS will be forced to requery the source.

To save time, I use the following macro to take care of it.

Press ALT+F8 to open the Macro Explorer, and add a new module called “RemoveRDLDataFiles” under MyMacros. Edit the file and add the following code to the file. (This is for SQL Server 2008, you may need to tweak the references for 2005).

Imports System Imports EnvDTE Imports EnvDTE80 Imports EnvDTE90 mports System.Diagnostics Imports System.IO Public Module RemoveRDLDataFiles Sub RemoveRDLDataFiles() Dim project As Project Dim Folder As String project = DTE.ActiveSolutionProjects(0) Dim fi As New FileInfo(project.FullName.ToString) Folder = fi.DirectoryName For Each FileFound As String In Directory.GetFiles(Folder, "*.rdl.data") File.Delete(FileFound) Next End Sub End Module

You can then run it by either double clicking on the macro, or assigning a keyboard shortcut to it (via Tools, Customize, Keyboard).

Loan Amortisation in SQL Server (PMT, FV, IPMT, PPMT)

Whilst designing a data warehouse for a banking client recently, I needed to calculate projected future loan payments (including breaking this down by interest and capital payments) for every customer throughout the life of the loan.

In Excel this is a pretty simple job, as Microsoft helpfully provide a number of functions to do just that (namely PMT, FV, IPMT and PPMT). In SQL Server however we do not have the luxury of having ready made functions, so I set about making my own.

Initially I coded them up as SQL Server functions, only to find that the internal rounding that SQL performs renders the results too inacurate to be usable. It was therefore necessary to write the functions in a C#, and wrap them up in a CLR library. SQL Server can then import them as scalar functions, to be used by any query that requires them.

To overcome this, the Purple Frog team have written a .Net CLR library which add four loan amortisation functions to SQL Server, which can be called from within a query as scalar functions, such as:

SELECT dbo.PMT(@APR/12.0, @Term, @LoanValue, 0, 0)

The functions provided are:

  • PMT   (The monthly payment of a loan)
  • FV    (The future value of a loan at a given month)
  • IPMT  (The interest portion of the monthly payment at a given month)
  • PPMT  (The capital portion of the monthly payment at a given month)

These are designed to mirror the parameters and results of the Excel functions, and have been written in C# using Visual Studio 2008, and tested against SQL Server 2008 Enterprise.

The functions perform surprisingly well; in tests I was able to calculate over 3 million monthly payments per minute, and that was on a relatively underpowered development server.

I must thank Kevin/MWVisa1 for writing a superb article explaining the finer points of the calculation process in his post here, on which the bulk of this code is derived.

You can download the C# code, or the pre-compiled binary from the Frog-Blog download section.

Power BI Sentinel
The Frog Blog

Team Purple Frog specialise in designing and implementing Microsoft Data Analytics solutions, including Data Warehouses, Cubes, SQL Server, SSIS, ADF, SSAS, Power BI, MDX, DAX, Machine Learning and more.

This is a collection of thoughts, ramblings and ideas that we think would be useful to share.


Alex Whittles
Jeet Kainth
Jon Fletcher
Nick Edwards
Joe Billingham
Lewis Prince
Reiss McSporran
Microsoft Gold Partner

Data Platform MVP

Power BI Sentinel
Frog Blog Out