A number of Frog-Blog posts over the next couple of months are going to make heavy use of the awesome SQL Server MERGE statement, introduced in SQL Server 2008. I thought it best to write an introduction post to provide the basics and groundwork for future posts.
So what does the Merge statement do?
Merge upserts data into a table. If, like me, you find the word ‘upsert’ more than a little ridiculous, we can scrap it in favour of a slightly more sensible description; It both inserts and updates data at the same time. Much better.
How does it do it?
In a similar way in which we’d use an UPDATE statement, we need to provide a source table and a destination table. We then give it a method of matching rows between them, usually the primary key or business/natural key (this can use multiple fields).
We can then specify a number of actions depending on whether a match is found or not.
If a match is found we may want to update the existing record.
If a match isn’t found then it’s likely that we’ll want to insert a new record.
What’s the syntax?
First we set the tables and the joining key(s)
MERGE [Destination Table] AS Target USING [Source Table] AS Source ON Target.KeyField = Source.KeyField
Then we define what we want to happen if we find a match (optional)
WHEN MATCHED
If you want you can specify other conditions, such as checking whether anything has changed
AND ( Target.Field1 <> Source.Field1 OR Target.Field2 <> Source.Field2 )
We can then perform an action. i.e. an update, delete, etc.
THEN UPDATE SET Field1 = Source.Field1 , Field2 = Source.Field2
Then we define what we want to happen if we don’t find a matching record (optional)
WHEN NOT MATCHED
In this case, we want to insert the new record
THEN INSERT ( KeyField , Field1 , Field2 ) VALUES ( Source.KeyField , Source.Field1 , Source.Field2 );
As this is the last part of the statement we have to terminate the merge with a semi colon.
Time for an example
We’ve got a customer table, and we receive an updated set of customer details which we want to import into our table. Assuming we’ve imported the new details into a staging table, we can use merge to do the work for us.
Use this script to create the necessary tables & data.
Our existing Customer table should look like this
And our staging table (new customer data) should look like
Firstly lets define the logic that we’re looking for.
- Our new customer data doesn’t include the CustomerID, so we can’t match on the primary key. Instead we’ll use the email address as the key.
- If the customer already exists then we want to check whether the fields are different, if they are then update the customer record.
- If the customer doesn’t exist then insert them as a new customer record.
- The Customer table contains audit fields which must be maintained. DTInserted should contain the datetime when the record was first created. DTUpdated should be updated whenever any of the data in a row is changed.
This is all performed using the following statement
MERGE Customer AS [Target] USING StagingCustomer AS [Source] ON Target.Email = Source.Email WHEN MATCHED AND ( Target.FirstName <> Source.FirstName OR Target.LastName <> Source.LastName OR Target.Title <> Source.Title OR Target.DoB <> Source.DoB ) THEN UPDATE SET FirstName = Source.FirstName ,LastName = Source.LastName ,Title = Source.Title ,DoB = Source.DoB ,DTUpdated = GETDATE() WHEN NOT MATCHED THEN INSERT ( Email ,FirstName ,LastName ,Title ,DoB ,DTInserted ,DTUpdated ) VALUES ( Source.Email ,Source.FirstName ,Source.LastName ,Source.Title ,Source.DoB ,GETDATE() ,GETDATE() );
It’s a long statement and looks complex at first glance, but as soon as you break it down it’s remarkably simple and easy to write.
If you run this, you should see 4 rows affected – 2 new rows inserted (Jack & Mary) and 2 rows updated (Phil & Jane). The new data has been inserted/updated into the Customer table and the audit fields have been kept up to date.
Note however that (if you look at the data in the sample scripts) Sarah & Andy exist in the Customer table but are missing from the new customer list in the update file. With the code above they just stay as they are in the Customer table, we assume they haven’t changed. What if we wanted to identify these customers and flag them as inactive or even delete them?
If we look at the ‘WHEN NOT MATCHED’ line, what this is actually doing is defaulting to ‘WHEN NOT MATCHED BY TARGET’. i.e. if you don’t find the record in the target table then do something.
We can add an extra section for ‘WHEN NOT MATCHED BY SOURCE’ which will allow us to specify an action when a row in our Customer table doesn’t have a matching record in the source staging data.
We can add:
WHEN NOT MATCHED BY SOURCE THEN DELETE
Which will simply delete the row in the Customer table, or we could update the record and terminate it using something like this.
WHEN NOT MATCHED BY SOURCE THEN UPDATE SET IsActive = 0
Don’t forget that the entire merge statement needs to be terminated with a semi colon, and we should also update the audit field. We also only want to update the record once, so check that the customer isn’t already terminated. We therefore end up with:
MERGE Customer AS [Target] USING StagingCustomer AS [Source] ON Target.Email = Source.Email WHEN MATCHED AND ( Target.FirstName <> Source.FirstName OR Target.LastName <> Source.LastName OR Target.Title <> Source.Title OR Target.DoB <> Source.DoB ) THEN UPDATE SET FirstName = Source.FirstName ,LastName = Source.LastName ,Title = Source.Title ,DoB = Source.DoB ,DTUpdated = GETDATE() WHEN NOT MATCHED BY TARGET THEN INSERT ( Email ,FirstName ,LastName ,Title ,DoB ,IsActive ,DTInserted ,DTUpdated ) VALUES ( Source.Email ,Source.FirstName ,Source.LastName ,Source.Title ,Source.DoB ,1 ,GETDATE() ,GETDATE() ) WHEN NOT MATCHED BY SOURCE AND Target.IsActive=1 THEN UPDATE SET IsActive = 0 ,DTUpdated = GETDATE() ;
Which leaves our Customer table looking like this
There are many variations and uses for this, we’ll explore its use in data warehouse dimension loading over the next couple of posts.
For now, Frog-Blog out.
[UPDATE] The next blog post in this series is now available: using T-SQL Merge to load Type 2 SCDs into a data warehouse dimension.
Alex, great post.
One of the most concise and straightforward explanations of MERGE usage I’ve seen.
What would be great is a follow up on using the OUTPUT clause for slowing changing dimensions (type 2,4 & 6) where the update rows in “WHEN MATCHED” also require an INSERT operation.
In the past I’ve used a script based on the DMV’s to auto generate these merge statements as they can become quite tedious and error prone on wide tables. The only caveat is that your source and target tables need to have matching column names.
There are also a few performance tricks if you’re running large datasets and have long and compound/multiple text fields constituting the KeyField in the “ON” section and the FirstName, LastName etc fields in the “WHEN MATCHED AND” section of your example. Primarily the use of a convert to varbinary, but it is only effective when paired up with appropriate indexes on both the datasets and is dependent of the ratio of new to changed rows in the source and the ratio of the source rows to target rows.
I really should get around to writing a post on this…
Thanks Karl
You’ve pretty much just summed up the next couple of posts in this series 🙂
In my MSc Dissertation I’ve been analysing the performance characteristics of loading type 2 SCDs using a number of methods in SSIS, including merge. I’ll be including the analysis here when I’ve completed it in the new year. I’ll also be covering it in a talk at SQLBits X (if I get selected – fingers crossed! Vote here: http://www.sqlbits.com/information/PublicSessions.aspx !!).
I’ll also be documenting my scripts to automate merge script creation. I tend to use the extended properties of columns to store the required behaviour – saves days of work when building a large system.
I’d be very interested to read about the performance tricks – that’s an area that I’ve not looked into, sounds very interesting. Could you post a link here when you get it done?
Thanks
Alex
Alex, your thesis sounds very interesting, maybe you can post a link if it is publicly available. thanks
Thanks, you can download it from the downloads section of this blog
https://purplefrogsystems.com/downloads/
[…] Contact Us « Introduction to T-SQL Merge Basics […]
Thanks. you have explained this command in a simple manner.
Hi Alex,
Your post really helped me figuring out how to merge two tables (I am new to SQLServer, and would otherwise have merged the data using Arrays in Excel).
Next step: how to incorporate the merge code into a View….
Cheers,
Julian
Netherlands
Hi Julian
Glad the post was useful. It’s not suitable to incorporate the Merge statement into a view – Views should generally be a basic reporting query to extract data, not used to perform data updates or changes. If you want to perform the merge repeatedly then you should create a Stored Procedure instead.
Regards
Alex
Hi Alex, thank you for pointing me in the right direction. Cheers, Julian
Need a WHEN MATCHED by TARGET do nothing example. . .
thanks!
hi .thanks for sharing
Excellent blog, you have saved my life with this. I was struggling with Merge but now I understand it. Just a quick question, if I wanted to apply some business validation rules to the source table before doing the actual merge, where would I do this. Furthermore, if i wanted to keep the “rubbish” data, how would I keep this and return this back to the user?
This page has been in my favourites for years now, it is one of the best explanations ever that i keep recommending to colleagues who have trouble with the syntax and something i regularly revisit to brush up on merge.