What is index fragmentation and how does it occur?

It is important to know that SQL Server data is stored in data pages, with each page holding 8KB of data. There are two types of fragmentation, both are a result of these pages not being used as efficiently as possible.

When you UPDATE or INSERT data on a page that is already full, SQL Server creates a new page. The information from the original page will be split 50/50 with half being added to the new page, along with the new record. Doing this causes empty space at the end of the page(s). This empty space is known as Internal Fragmentation, it also occurs when you delete data from a page. Here is a diagram showing how this occurs:

Diagram showing how External Fragmentation occurs

This newly created page won’t sit in sequence with the original page(s) meaning the related data is now stored in a different area of the disk. This separated data is called External Fragmentation.

Why is fragmentation a bad thing?

Fragmented indexes can lead to performance issues. Say you have 1MB of data, this will theoretically fit on 128 data pages (1024KB / 8KB = 128 pages), however, due to internal fragmentation, this 1MB of data may exist on multiple, partial pages. When SQL Server runs a query against a table that involves an index scan, this scan must execute on every page. It would scan 128 full pages a lot faster than it would 200+ fragmented pages. Now also remember that the data that the query is looking for may exist in various places on the disk due to external fragmentation, the extra scans coupled with the increased I/O can cause huge performance issues.

The numbers used in the above example help keep the math easy, but imagine a table with billions of rows and you can start to see the scale of the potential performance issues.

How do I fix fragmentation?

There is a Table Valued Function called sys.dm_db_index_physical_stats that will allow you to analyse the level of fragmentation in the indexes. Once you have used this to identify the problematic indexes you can decide whether to reorganise or rebuild them.

Reorganising does exactly what is sounds like, it puts the pages back in order. This will help with the external fragmentation, but the content of the pages won’t change.

Rebuilding drops the index and rebuilds it from scratch, it’s a longer, more arduous process, but will resolve both internal and external defragmentation.

What is Fill-Factor?

Screenshot showing the location of Fill Factor settings

To try and combat fragmentation, many DBAs implement a custom fill factor. Fill factor is a value that determines how much of a page, as a percentage, will be used for storage and how much will be intentionally left free. For example, setting it to 80 will leave 20% of the page empty, but then, aren’t we just introducing internal fragmentation on purpose? Well yes, but now when you insert data, there is room on the page for it, rather than it being split onto a new page elsewhere on the disk. In other words, you sacrifice having a higher level of internal fragmentation to limit the amount of external fragmentation.

There is no “one size fits all” value when it comes to fill factor, so it should be set on an index level rather than a server level and should be set depending on what type of work that index is doing? A static table that is only used for reads would benefit more from a value closer to 0, (100% page use), than a table that has records inserted and deleted regularly. This is where a DBA’s skills are invaluable.