Archive for the ‘SQL Server’ Category

Leap before you can land

Monday, July 26th, 2010

Leap before you can land…

Research scientists at the Southern Illinois University have concluded that primitive frogs learned to jump before they learned to land. The result being a rather inelegant crash landing. I’m sure the frogs were happy that they achieved their goal of getting from A to B, but I do feel sorry for them having to endure the presumably painful landing, not to mention the indignity of having the rest of the pond witness the proceedings!

What struck me about this story, aside from the obvious frog reference (albeit not purple!), was the parallel to Business Intelligence projects that we encounter time and time again. BI systems that are put in place, but whilst the project teams are basking in the glory of accomplishing their goal, often sporting a couple of bruises picked up along the way, the wider business looks on not really knowing what has just happened or understanding how the company or their department are going to benefit from this leap in technology.

Although BI, as both a business function and technical toolset, has long since reached maturity, its adoption within SMEs is still surprisingly in its youth. Most companies who embark on a BI project seem to do so either because they have been sold on a fantastic vision of the future of information management, or because they have a specific business need to solve. Whatever the reasons for initiating a BI project, there is often very little thought given to how this new information system is likely to impact the wider business, or how the system can be utilised to provide far more benefit than the original project scope called for.

There are two aspects to this impact, the pain and the gain.

The gain

When the company’s data is cleaned, consolidated and remodelled to make reporting and analysis more simple (one of the primary goals of a BI project), the tendency is to use that system to reproduce the company’s old static reports but in a slightly more glossy fashion, only faster and more accurate. This stops far short of what a good data warehouse or cube is capable of. Enter self-service BI.

Self-service BI essentially allows your employees to have a level of (secured) access to the underlying BI data, combined with the right tools to perform their own analysis. This enables them to not just consume their old static reports, but to pro-actively go digging for information, knowledge and a greater understanding of your company. Gone are the days when requiring a change to a report has to be documented, requested, specified, programmed, tested & released. A well designed data warehouse and cube, along with the right client tools (often just Excel), enables a user to just do it themselves with the bare minimum of training (if any), whilst retaining the confidence that the resulting report is accurate; the warehouse/cube take care of ensuring the underlying business logic is right.

The pain

A data warehouse is, by definition, a combined set of data from a variety of source systems. This means that any change to the source systems must be assessed for their impact on the warehouse. After putting a sparkly new BI infrastructure in place, you don’t want it to fall over because someone changed the structure of a source database or extract file! This adds a layer of change control that may not have existed before, and can cause some headaches.

There is usually a moderately large requirement for business users’ time, both to help the BI architect to understand and model the business properly, and also in testing.

Pain but no gain?

So, if the benefits such as self service BI (with dashboards, data mining, KPIs, PowerPivot etc. etc) are not adopted and utilised, the business is left having made an impressive technology leap, but is then stuck with the impact on source systems, a bill, and confusion from staff who don’t see the benefit of getting their old reports via a different method.

Evolution taught the frog that by landing properly, he can quickly jump again to go even further, and avoid the pain in the meantime.

SQL User Group Session 24 June 2010

Wednesday, June 9th, 2010

I’m excited to be presenting another session to the South Wales SQL Server User Group.

On Thursday 24th June 2010, Eversheds in Cardiff are kindly hosting the event, to run from 18:45 to 21:00.
The event is free, and you’ll even get pizza thrown in – what more can you ask for? Oh yes, some BI content…

My session will cover data warehouse modelling, including a number of hands on business case studies including transactional data, account balances and duration based data.

Please feel free to bring your own data modelling problems along and I’ll try and cover as many as I can.

Register for free here: http://www.sqlserverfaq.com/events/235/Data-warehouse-design-case-studies-Other-BI-related-session-TBC.aspx

Hope to see you there!

SQL Server 2008 R2 – PowerPivot and Master Data Services

Tuesday, May 18th, 2010

Purple Frog spent a very interesting day at Microsoft last week, at one of their many events promoting the launch of SQL Server 2008 R2. Rafal Lukewiecki presented an entertaining (as always!) and informative series of talks covering the release, focusing on the enhanced Business Intelligence tools available.

The primary changes to note are

  • Power Pivot – An in memory, client side add-in to Excel, that allows users to create virtual cubes on their desktop and analyse over 100m records of data virtually instantly
  • DAX – A new expression language, designed for non-technical (probably more realistically, semi-technical) users to extend pivot tables and power pivot tables without having to learn MDX
  • Report Components – In a report consisting of a couple of tables, a chart and a few gauges (gauges, sparklines & maps are all new features of SSRS), you can save each element as a component and re-use it in different reports. This should result in much less duplication of work.
  • Report Builder 3 – A thin-client tool allowing end users to create Reporting Services reports. This is a big enhancement over its predecessor s it is finally fully compatible with reports created in the Business Intelligence Development Studio (BIDS), including report components.
  • Master Data Services – A centralised tool and database intended to provide governance of your organisation’s master data (centralised list of products, fiscal calendar, regions etc.).

The enhancements to Reporting Services (SSRS) are very welcome, and should be of huge benefit to anyone either currently using SSRS or considering using it. I firmly believe that there are no comparable web based reporting engines that even come close for SME organisations when looking at the whole picture including cost of implementation, ease of use, flexibility and capability.

Master Data Services as a concept has been around for a long time, but there has never been a tool available to organisations to effectively implement it. This is Microsoft’s first proper stab at delivering a workable solution, and although I’m a big fan of the concept, and have no doubt of its benefit to a SME, I’m yet to be convinced that the tool is ready for a large scale corporate environment. Time will tell how scalable and manageable the system is, and credit has to go to Microsoft for starting the ball rolling.

The most impressive addition is without a doubt PowerPivot. In a nutshell, it’s a user defined OLAP cube wrapped up within Excel 2010, running entirely in memory on a user’s workstation. If you’ve not yet played with it or seen a demo, I’ll try and elaborate for you… Think about loading Excel with 1 million rows, and then imagine sorting and filtering a number of those columns [cue going out to lunch whilst waiting for Excel to catch up]. With PowerPivot, you can sort and filter over 100 million rows of data almost in an instant – it’s very impressive indeed!

That’s the snazzy demo bit, but to reduce it to a glorified spreadsheet is very harsh indeed. It allows a user to import multiple data sources and combine them together into a single dimensional data model, PowerPivot will create your own personal cube, without you having to build a warehouse, without knowing anything about MDX, dimension hierarchies, attribute relationships, granularity etc. etc.

Microsoft’s vision and reason for creating this tool is self-service BI, allowing users to create their own cubes, data analysis environments and reporting systems. And this is where I start to have a problem…

I can’t remember the last time I designed a data warehouse, where I did not find significant data quality problems, conflicting data, missing data, duplicated data etc.. I also find it hard to think of a situation where an end user (even a power user) is sufficiently clued up about the intricacies of a source OLTP database to be able to extract the right data and know what to do with it. Or if they are, a dozen other people in different departments have a different idea about how things work, resulting in many different versions of the truth.

I’m therefore (for now!) sticking with the opinion that it is still absolutely vital for an organisation to provide a clean, consistent, dimensionally modelled data warehouse as the basis for their BI/MI infrastructure. Tools like PowerPivot then sit very nicely on top to provide an incredibly powerful and beneficial user experience, but to try and use the emergence of self-service BI tools to usher in a new ‘non-data warehouse’ era is a very dangerous route which I hope people will avoid.

In summary – this release brings with it a fantastic host of new tools, but with great power comes great responsibility…

SQLBits and Microsoft BI conferences

Thursday, April 15th, 2010

Following the launch of SQL Server 2008 R2 this week, the ever dedicated SQL Bits team are putting on the next installment tomorrow with SQL Bits VI – ‘the Sixth Sets’. The Purple Froggers will of course be there, hopefully in time for a pre-conference bacon butty!

Also worth mentioning is ‘Microsoft Solutions for Business Intelligence’, a seminar that Microsoft are holding at their Reading campus on May 12 2010. Rafal Lukewiecki is speaking, anyone who has seen him before will know what an entertaining and knowlegable presenter he is. Whether you are currently implementing a BI solution or are just ‘BI curious’, a technical developer or a business manager, the day will provide an insight into the direction BI is heading in, and what we can expect in the years to come.

We hope to see you at either/both!

Loan Amortisation in SQL Server (PMT, FV, IPMT, PPMT)

Wednesday, March 17th, 2010

Whilst designing a data warehouse for a banking client recently, I needed to calculate projected future loan payments (including breaking this down by interest and capital payments) for every customer throughout the life of the loan.

In Excel this is a pretty simple job, as Microsoft helpfully provide a number of functions to do just that (namely PMT, FV, IPMT and PPMT). In SQL Server however we do not have the luxury of having ready made functions, so I set about making my own.

Initially I coded them up as SQL Server functions, only to find that the internal rounding that SQL performs renders the results too inacurate to be usable. It was therefore necessary to write the functions in a C#, and wrap them up in a CLR library. SQL Server can then import them as scalar functions, to be used by any query that requires them.

To overcome this, the Purple Frog team have written a .Net CLR library which add four loan amortisation functions to SQL Server, which can be called from within a query as scalar functions, such as:

SELECT dbo.PMT(@APR/12.0, @Term, @LoanValue, 0, 0)

The functions provided are:

  • PMT   (The monthly payment of a loan)
  • FV    (The future value of a loan at a given month)
  • IPMT  (The interest portion of the monthly payment at a given month)
  • PPMT  (The capital portion of the monthly payment at a given month)

These are designed to mirror the parameters and results of the Excel functions, and have been written in C# using Visual Studio 2008, and tested against SQL Server 2008 Enterprise.

The functions perform surprisingly well; in tests I was able to calculate over 3 million monthly payments per minute, and that was on a relatively underpowered development server.

I must thank Kevin/MWVisa1 for writing a superb article explaining the finer points of the calculation process in his post here, on which the bulk of this code is derived.

You can download the C# code, or the pre-compiled binary from the Frog-Blog download section.

SQL Bits 6 – Registration Open

Wednesday, March 17th, 2010

For anyone interested in SQL Server, the unmissable SQL Bits conference is returning for round six. “SQL Bits – The 6th Sets” is being held in central London for the first time, and unlike previous conferences it will be for one day only on Friday 16th April 2010.

It’s being held as part of the SQL Server 2008 R2 release, also happening that week in London.

Registration is now open, you can find out more at the SQL Bits website.

We highly recommend the event, hope to see you there…

Data Warehouse Presentation

Tuesday, February 23rd, 2010

Purple Frog is presenting a session on data warehouse design concepts at the South Wales SQL Server User Group on Thursday 25th February 2010.

If you’re in the area and want to come along you can register for free here. Eversheds are hosting the event in their Cardiff office.

Adam Morton will be demonstrating an ETL control framework in SSIS, and Alex Whittles will be discussing the concepts of data warehousing and the fundamental differences in architecture between OLTP and OLAP systems.

Hope to see you there!

Dynamic MDX in Reporting Services

Thursday, September 18th, 2008

After a couple of days off work last week with SQL Bits III and SQL 2008 Unleashed, it’s back to the grindstone this week; however before I get onto the serious stuff I’d like to say thank you to the organisers of both events. Watching you on stage headbanging to Rockstar – legendary! (You can see for yourself here and here…).

Anyway, back to MDX…

This post explains how you can build a dynamic MDX query in Reportins Services, customised to the users requirements. This can often bring some quite major performance benefits.

Lets say for example that you want to have a sales report grouped dynamically by either product, sales person, department or customer. Normally you would use a single static MDX query, and then add a dynamic grouping to the table in the report. This is fine, until you try it on a large dataset. If you only have 50 products, 2 salesmen, 5 departments and 100 customers, your MDX needs to return 50,000 records, the report then has to summarise all of this into the level of grouping you want. This renders the pre-calculated aggregations in OLAP pretty much worthless.

To get around this, you can generate your MDX dynamically, so that the query returns the data already grouped into the correct level. You can also use this to add extra filters to the query, but only when they are required.

To start with, lets see how we would do this normally with SQL. Assuming we’re working from a denormalised table such as this

Dynamic MDX Table

Dynamic SQL is pretty simple, instead of having your dataset query as

  SELECT SalesPerson,
      Sum(Sales) AS Sales
   FROM tblData
   GROUP BY SalesPerson

you can add a report parameter called GroupBy,

Dynamic MDX SQL

and then use an expression as your dataset

  ="SELECT "
      + Parameters!GroupBy.Value + " AS GroupName,
      Sum(Sales) AS Sales
   FROM tblData
   GROUP BY " + Parameters!GroupBy.Value

However MDX queries don’t let you use an expression in the dataset, so we have to work around that quite major limitation. To do this we make use of the OpenRowset command. You need to enable it in the surface area config tool, but once it’s enabled you can fire off an OpenRowset command to SQL Server, which will then pass it on to the cube. As the datasource connnection is to SQL Server not Analysis Services, it allows you to use an expression in the dataset.

  ="SELECT * FROM OpenRowset('MSOLAP',
    'DATA SOURCE=localhost; Initial Catalog=SalesTest;',
    'SELECT
      {[Measures].[Sales]} ON 0,
      NON EMPTY {[Product].[Product].[Product].Members} ON 1
      FROM Sales') "

You can then expand this to make it dynamic depending on the value of a parameter. Before we do this though, there are a couple of items I should point out.
1) As the expression can get quite large, I find it much easier to create the query from a custom code function
2) As SSRS can’t interpret the expression at runtime, you need to define the fields in your dataset manually (more on this later)

To use a custom code function, just change the dataset expression to

  =Code.CreateMDX(Parameters)

We pass in the parameters collection so that we can use the parameters to determine the query. Create a function called CreateMDX() in the code block

Dynamic MDX Code

You can then construct your MDX query within the code block.

  Public Function CreateMDX(ByVal params as Parameters) as string

   Dim mdx as string 

   mdx = "SELECT * FROM OpenRowset("
   mdx += " 'MSOLAP', "
   mdx += " 'DATA SOURCE=localhost; Initial Catalog=SalesTest;', "
   mdx += " ' SELECT {[Measures].[Sales]} ON 0, "
   mdx += "    NON EMPTY {[Product].[Product].[Product].Members} ON 1 "
   mdx += "   FROM Sales ' "
   mdx += ")"

   return mdx

End Function

We’re almost there…
The next problem is that the field names returned by the query are less than helpful. To fix this we just need to alias the fields in the query. I usually take the opportunity of casting the numerical fields so that the report treats them as such, rather than as a string.

  Public Function CreateMDX(ByVal params as Parameters)
                  as string

   Dim mdx as string 

   mdx = "SELECT "
   mdx += "  ""[Product].[Product].[Product].[MEMBER_CAPTION]"" AS GroupName, "
   mdx += "   Cast(""[Measures].[Sales]"" AS int) AS Sales "
   mdx += " FROM OpenRowset("
   mdx += " 'MSOLAP', "
   mdx += " 'DATA SOURCE=localhost; Initial Catalog=SalesTest;', "
   mdx += " ' SELECT {[Measures].[Sales]} ON 0, "
   mdx += "    NON EMPTY {[Product].[Product].[Product].Members} ON 1 "
   mdx += "   FROM Sales ' "
   mdx += ")"

   return mdx

End Function

(please do watch out for the quotes, double quotes and double double quotes, it can get a little confusing!)
We then need to tell the dataset which fields to expect from the query.

Dynamic MDX Fields

You can now use the dataset in your report.
However, the original point of this was to make the query dynamic… All we need to do to achieve this is expand the VB.Net code accordingly.

  Public Function CreateMDX(ByVal params as Parameters) as string

   Dim mdx as string 

   mdx = "SELECT "

IF params("GroupBy").Value.ToString()="Product" THEN
   mdx += "  ""[Product].[Product].[Product]"
ELSE IF params("GroupBy").Value.ToString()="SalesPerson" THEN
   mdx += "  ""[Sales Person].[Sales Person].[Sales Person]"
ELSE IF params("GroupBy").Value.ToString()="Customer" THEN
   mdx += "  ""[Customer].[Customer].[Customer]"
END IF

   mdx += ".[MEMBER_CAPTION]"" AS GroupName, " 

   mdx += "   Cast(""[Measures].[Sales]"" AS int) AS Sales "
   mdx += " FROM OpenRowset("
   mdx += " 'MSOLAP', "
   mdx += " 'DATA SOURCE=localhost; Initial Catalog=SalesTest;', "
   mdx += " ' SELECT {[Measures].[Sales]} ON 0, "

IF params("GroupBy").Value.ToString()="Product" THEN
   mdx += "  NON EMPTY {[Product].[Product].[Product]"
ELSE IF params("GroupBy").Value.ToString()="SalesPerson" THEN
   mdx += "  NON EMPTY {[Sales Person].[Sales Person].[Sales Person]"
ELSE IF params("GroupBy").Value.ToString()="Customer" THEN
   mdx += "  NON EMPTY {[Customer].[Customer].[Customer]"
END IF

   mdx += ".Members} ON 1 "

   mdx += "   FROM Sales ' "
   mdx += ")"

   return mdx

End Function

It’s certainly not that simple, and debugging can cause a few headaches, but you can benefit from a massive performance in complex reports if you’re prepared to put the work in.

Dynamic MDX Results

You can download the project files here

As always, please let me know how you get on with it, and shout if you have any queries…

Alex

SQLBits III and more…

Tuesday, September 2nd, 2008

Next week we’re lucky enough to have a double-whammy of SQL events in the UK, has Christmas come early?…

On Wednesday 10th September, Microsoft are running a live Technet event, SQL Server 2008 Unleashed, at their Reading campus. It looks like it should be a day well spent, with Prem Mehra delivering a keynote speech on large scale SQL Server systems, and other sessions including data warehouse improvements, upgrade paths, and a session focusing on BI. The event is now full, so I hope you’ve already booked your place…

Then, on Saturday 13th September, we’ve got the long awaited SQLBits III event in Hatfield. There are far too many sessions to list here, but with dedicated tracks for BI, Admin, Development and Katmai, there is certainly something for everyone. You can safely assume that I’ll be on the BI track all day!

I’ll be going to both events; I know a few of you regular frog-blogees are also going, so I look forward to meeting you at one or both.

Alex

Use Stored Procedure Output Parameters in SSRS

Tuesday, August 5th, 2008

It’s commonly known that there are a few frustrations when using stored procedures as the source for SSRS 2005 datasets, the inability to cope with temporary tables is always a key one, another is not being able to get at output parameters or return values.

This post explains the easiest way to get at the output parameters and return value.

Lets assume you are using the following stored proc:


IF OBJECT_ID('myProc') IS NOT NULL
  DROP PROCEDURE myProc
GO
CREATE PROCEDURE dbo.myProc
  @OutTest int OUTPUT
AS
  SET @OutTest = 42

  --Random test query
  SELECT 1 as x, 'rec1' as y, 'rec1' as z
  UNION
  SELECT 2, 'rec2', 'rec2'

  RETURN 1
GO

When we set the dataset to use myProc, we will get the three fields (x, y and z) returned, but not the @OutTest parameter, or the return value of 1.

To fix this all you need to do is create a wrapper procedure that converts the values into extra fields.


IF OBJECT_ID('myProcWrapper') IS NOT NULL
  DROP PROCEDURE myProcWrapper
GO
CREATE PROCEDURE dbo.myProcWrapper
AS

  CREATE TABLE #myTmp(
    x int,
    y varchar(20),
    z varchar(20)
  )

  DECLARE @ReturnValue int, @OutTest int

  INSERT INTO #myTmp
    EXEC @ReturnValue=dbo.myProc @OutTest OUTPUT

  SELECT x, y, z,
      @ReturnValue as RetVal,
      @OutTest as [Output]
    FROM #myTmp

  Drop Table #myTmp

  RETURN 1
GO

This runs the original procedure, and stores the values in a temporary table (yes, I know I know…!). It then appends the return value and output parameters as extra fields and then returns the whole lot. These values are repeated for every record returned, but all you have to do is take the value from the first record =First(Fields!Output.Value).

If you wanted to get rid of the temporary table, you could always replace it with a permenant table, but you’d have to be careful to keep each client instance seperate so results for different users didn’t interfere with each other. You could do this by adding a working column to the myTmp table contaning the username and/or workstation ID. Something like this…


IF OBJECT_ID('myTmp') IS NOT NULL
  DROP TABLE myTmp
GO

CREATE TABLE myTmp(
  x int,
  y varchar(20),
  z varchar(20),
  UserID varchar(50),
  Workstation varchar(50)
)
GO
ALTER TABLE dbo.myTmp ADD CONSTRAINT
  DF_myTmp_UserID DEFAULT SYSTEM_USER FOR UserID
GO
ALTER TABLE dbo.myTmp ADD CONSTRAINT
  DF_myTmp_Workstation DEFAULT Host_Name() FOR Workstation
GO

IF OBJECT_ID('myProcWrapper') IS NOT NULL
  DROP PROCEDURE myProcWrapper
GO
CREATE PROCEDURE dbo.myProcWrapper
AS

  DECLARE @ReturnValue int, @OutTest int

  DELETE FROM myTmp
    WHERE UserID = SYSTEM_USER
    AND Workstation = Host_Name()

  INSERT INTO myTmp (x,y,z)
    EXEC @ReturnValue=dbo.myProc @OutTest OUTPUT

  SELECT x, y, z,
      @ReturnValue as RetVal,
      @OutTest as [Output]
    FROM myTmp
    WHERE UserID = SYSTEM_USER
    AND Workstation = Host_Name()

  DELETE FROM myTmp
    WHERE UserID = SYSTEM_USER
    AND Workstation = Host_Name()

  RETURN 1
GO

This still has the benefit of not having to modify the original stored procedure, but now allows you to see the available fields in Reporting Services.