0845 643 64 63

Monthly Archives: December 2011

Introduction to T-SQL Merge Basics

A number of Frog-Blog posts over the next couple of months are going to make heavy use of the awesome SQL Server MERGE statement, introduced in SQL Server 2008. I thought it best to write an introduction post to provide the basics and groundwork for future posts.

So what does the Merge statement do?

Merge upserts data into a table. If, like me, you find the word ‘upsert’ more than a little ridiculous, we can scrap it in favour of a slightly more sensible description; It both inserts and updates data at the same time. Much better.

How does it do it?

In a similar way in which we’d use an UPDATE statement, we need to provide a source table and a destination table. We then give it a method of matching rows between them, usually the primary key or business/natural key (this can use multiple fields).

We can then specify a number of actions depending on whether a match is found or not.

If a match is found we may want to update the existing record.

If a match isn’t found then it’s likely that we’ll want to insert a new record.

What’s the syntax?

First we set the tables and the joining key(s)

MERGE [Destination Table] AS Target
   USING [Source Table] AS Source
     ON Target.KeyField = Source.KeyField

Then we define what we want to happen if we find a match (optional)

  WHEN MATCHED

If you want you can specify other conditions, such as checking whether anything has changed

     AND  (
         Target.Field1 <> Source.Field1
         OR Target.Field2 <> Source.Field2
         )

We can then perform an action. i.e. an update, delete, etc.

  THEN UPDATE SET
        Field1 = Source.Field1
      , Field2 = Source.Field2

Then we define what we want to happen if we don’t find a matching record (optional)

   WHEN NOT MATCHED

In this case, we want to insert the new record

      THEN INSERT (
             KeyField
           , Field1
           , Field2
        ) VALUES (
             Source.KeyField
           , Source.Field1
           , Source.Field2
      );

As this is the last part of the statement we have to terminate the merge with a semi colon.

Time for an example

We’ve got a customer table, and we receive an updated set of customer details which we want to import into our table. Assuming we’ve imported the new details into a staging table, we can use merge to do the work for us.

Use this script to create the necessary tables & data.

Our existing Customer table should look like this

And our staging table (new customer data) should look like

Firstly lets define the logic that we’re looking for.

  • Our new customer data doesn’t include the CustomerID, so we can’t match on the primary key. Instead we’ll use the email address as the key.
  • If the customer already exists then we want to check whether the fields are different, if they are then update the customer record.
  • If the customer doesn’t exist then insert them as a new customer record.
  • The Customer table contains audit fields which must be maintained. DTInserted should contain the datetime when the record was first created. DTUpdated should be updated whenever any of the data in a row is changed.

This is all performed using the following statement

MERGE Customer        AS [Target]
USING StagingCustomer AS [Source]
   ON Target.Email = Source.Email
WHEN MATCHED AND
    (
         Target.FirstName <> Source.FirstName
      OR Target.LastName  <> Source.LastName
      OR Target.Title     <> Source.Title
      OR Target.DoB       <> Source.DoB
    )
    THEN UPDATE SET
       FirstName = Source.FirstName
      ,LastName  = Source.LastName
      ,Title     = Source.Title
      ,DoB       = Source.DoB
      ,DTUpdated = GETDATE()
WHEN NOT MATCHED
    THEN INSERT (
        Email
       ,FirstName
       ,LastName
       ,Title
       ,DoB
       ,DTInserted
       ,DTUpdated
      ) VALUES (
        Source.Email
       ,Source.FirstName
       ,Source.LastName
       ,Source.Title
       ,Source.DoB
       ,GETDATE()
       ,GETDATE()
      );

It’s a long statement and looks complex at first glance, but as soon as you break it down it’s remarkably simple and easy to write.

If you run this, you should see 4 rows affected – 2 new rows inserted (Jack & Mary) and 2 rows updated (Phil & Jane). The new data has been inserted/updated into the Customer table and the audit fields have been kept up to date.

Note however that (if you look at the data in the sample scripts) Sarah & Andy exist in the Customer table but are missing from the new customer list in the update file. With the code above they just stay as they are in the Customer table, we assume they haven’t changed. What if we wanted to identify these customers and flag them as inactive or even delete them?

If we look at the ‘WHEN NOT MATCHED’ line, what this is actually doing is defaulting to ‘WHEN NOT MATCHED BY TARGET’. i.e. if you don’t find the record in the target table then do something.
We can add an extra section for ‘WHEN NOT MATCHED BY SOURCE’ which will allow us to specify an action when a row in our Customer table doesn’t have a matching record in the source staging data.

We can add:

WHEN NOT MATCHED BY SOURCE
    THEN DELETE

Which will simply delete the row in the Customer table, or we could update the record and terminate it using something like this.

WHEN NOT MATCHED BY SOURCE
    THEN UPDATE SET
        IsActive = 0

Don’t forget that the entire merge statement needs to be terminated with a semi colon, and we should also update the audit field. We also only want to update the record once, so check that the customer isn’t already terminated. We therefore end up with:

MERGE Customer        AS [Target]
USING StagingCustomer AS [Source]
   ON Target.Email = Source.Email
WHEN MATCHED AND
    (
         Target.FirstName <> Source.FirstName
      OR Target.LastName  <> Source.LastName
      OR Target.Title     <> Source.Title
      OR Target.DoB       <> Source.DoB
    )
    THEN UPDATE SET
       FirstName = Source.FirstName
      ,LastName  = Source.LastName
      ,Title     = Source.Title
      ,DoB       = Source.DoB
      ,DTUpdated = GETDATE()
WHEN NOT MATCHED BY TARGET
    THEN INSERT (
        Email
       ,FirstName
       ,LastName
       ,Title
       ,DoB
       ,IsActive
       ,DTInserted
       ,DTUpdated
      ) VALUES (
        Source.Email
       ,Source.FirstName
       ,Source.LastName
       ,Source.Title
       ,Source.DoB
       ,1
       ,GETDATE()
       ,GETDATE()
      )
WHEN NOT MATCHED BY SOURCE
    AND Target.IsActive=1
    THEN UPDATE SET
        IsActive = 0
       ,DTUpdated = GETDATE()
      ;

Which leaves our Customer table looking like this

There are many variations and uses for this, we’ll explore its use in data warehouse dimension loading over the next couple of posts.

For now, Frog-Blog out.

[UPDATE] The next blog post in this series is now available: using T-SQL Merge to load Type 2 SCDs into a data warehouse dimension.

Fast record count in SQL Server using sp_spaceused

Just a quick tidbit today, but a very useful one.
Ever need to do SELECT Count(*) on a very large table? It can take ages.

Instead, use sp_spaceused

EXEC sp_spaceused 'MyTable'

It returns you a useful result set like

name rows reserved data index_size unused
MyTable 19690290 13637488 KB 3001688 KB 9986416 KB 649384 KB

Not only does it also give you extra useful information about the table, but it comes back virtually instantly.

Columns:
  name – name of the object requested
  rows – the number of rows in the table (or service broker queue)
  reserved – the total reserved space for the object
  data – amount of space used by the data
  index_size – amount of spaced used by indexes in the object
  unused – total space reserved for the object but not yet used

You can also use sp_spaceused without any parameters, and it will return you the stats for the whole database.

Frog-Blog Out

The Frog Blog

I'm Alex Whittles.

I specialise in designing and implementing SQL Server business intelligence solutions, and this is my blog! Just a collection of thoughts, techniques and ramblings on SQL Server, Cubes, Data Warehouses, MDX, DAX and whatever else comes to mind.

Data Platform MVP

Frog Blog Out
twitter
rssicon