All too often there are questions asked about missing records that are identified by gaps in the identity column of a table.
If a table, ordered by the identity column, shows “1, 2, 3, 5”, record 4 must have been deleted right? Well, although that is a possibility, it’s not a safe assumption as there are other causes of these “missing” IDs.
Let’s start with a simple table, it only has 3 columns, the first of which being an Identity (1,1) column called ID. This means it starts at 1 and increments by 1 with each record. The other 2 columns are VARCHAR(20) NOT NULL columns for Forename and Surname.
We’ll put a few records in to show how the ID column self-populates:
Now, it’s very common for data to be inserted into a table transactionally. SQL Server will allocate all of the inserted records an ID, however, if the transaction is rolled back, although the records aren’t committed, the identities are not released.
Here is a demonstration of this in action, first we see the records added to the table and rolled back out (notice the SELECT statement is inside the ROLLBACK):
The IDs appear as expected, so let’s go ahead and COMMIT those records this time:
As you can see, IDs 4, 5 and 6 were not released and will not be written to the table, our new records start at 7.
Another example of when this occurs is when records are inserted that contain errors. Remember our simple table has a NOT NULL condition on both name fields, let’s try and write a NULL record:
We get an error as expected, so let’s fix it and write the record correctly:
Here you can see ID 10 was assigned to the erroneous INSERT, and when the error occurred, the ID was not released, our new record has skipped an ID again and started at 11.
Sometimes you will see gaps of 10,000 or more in an ID column, this can occur when SQL Server is processing batches of inserts. To aid performance, “chunks” of IDs will be pre-reserved for a transaction and then the excess is released after completion. For example, if you’re writing 8,000 records to a table in one go, SQL Server may allocate 10,000 IDs, then release 8,001 – 10,000 once the INSERT has finished. However, if the table is written to by another transaction simultaneously due to isolation level reasons, this record will be given ID 10,001. Also, if the server crashes during the bulk insert, the IDs are not let go.
This “skipping” of IDs is intended behaviour in SQL (ID columns in a dedicated SQL pool by their nature cannot be sequential). An identity column in SQL Server is designed to write a unique ID per record in each transaction and to auto-increment the seed on the table, nothing more. It doesn’t even guarantee the uniqueness of the column in the table. An ID column should never be explicitly used to track the number of records in a table.
There is a function called DBCC CHECKIDENT that can be used to report on the state of the ID column, below you can see we call the function, insert and rollback some records (to consume the IDs as mentioned above) and then check again:
The current COLUMN VALUE of the ID column is still 11 as the records weren’t committed, but the current assigned value for the table is now 15 as 12, 13, 14 and 15 were all consumed. We can test this by committing those records to the table:
As expected, the records are inserted starting at 16.
This function has more arguments that allow you to RESEED a table and manipulate the ID column, however I’m not going into depth here as there are a lot of risks in doing this and some research should be done before performing such an action.
There is more information available here: DBCC CHECKIDENT (Transact-SQL) – SQL Server | Microsoft Learn