SQL Server - Change Data Capture (CDC)

 
 
  • Gérald Barré

Change Data Capture is a mechanism built into SQL Server since 2008 that is intended to keep a history of changes to data in one or more tables. Indeed in some cases, you want to save all the steps (creation, modification, deletion) to arrive at the data of a table at a given time.

#Enable Change Data Capture

Change Data Capture is not enabled by default, so use the following command:

SQL
USE Sample
GO
EXEC sys.sp_cdc_enable_db
GO

You can check the good execution of the command via the following query:

SQL
SELECT [name], database_id, is_cdc_enabled
FROM sys.databases

You can also see that SQL Server adds some new tables to the database:

The names of the tables are explicit, but here is an example of the data contained in each of them:

  • cdc.captured_columns: list of captured columns

  • cdc.change_tables: Tables whose capture is enabled

  • cdc.ddl_history: Contains all structure changes (DDL)

  • cdc.index_columns: Contains the indexes of the captured tables

  • cdc.lsn_time_mapping: Contains the link between the LSNs and the date.

#Enable Change Data Capture on a table

Once CDC is activated at the database level, it must be activated on the tables for which you want to follow the changes. For this we execute the following procedure:

SQL
EXEC sys.sp_cdc_enable_table
@source_schema = N'dbo',
@source_name   = N'Employee',
@role_name     = NULL
GO

If necessary you can filter the columns to track to reduce the amount of data saved:

SQL
EXEC sys.sp_cdc_enable_table
@source_schema = N'dbo',
@source_name   = N'Employee',
@role_name     = NULL,
@index_name    = N'PK_Employee',
@captured_column_list = N'[Id],[FirstName],[LastName]'
GO

Once again you can check that CDC is activated:

SQL
SELECT [name], is_tracked_by_cdc
FROM sys.tables

You can also see that a new table has been created:

And 2 jobs:

And you can also use the following command to view the configuration of CDC:

SQL
EXEC sys.sp_cdc_help_change_data_capture

Now any changes in the Employee table (insert, update, delete) will automatically add one or more rows to the table cdc.dbo_Employee_CT, without any modification of the requests.

#Querying

Let's start by doing some operations on the table:

SQL
INSERT INTO Employee (FirstName, LastName, hid) VALUES ('John', 'Doe', '/1/')

UPDATE Employee
SET FirstName = 'Jane'
WHERE Id = 20

UPDATE Employee
SET hid = '/2/'
WHERE Id = 20

You can see that the changes are directly saved in raw form in the table created by CDC:

SQL
SELECT * FROM [Sample].[cdc].[dbo_Employee_CT]

In this table we find:

  • The operation performed (__ $ operation)
    • 1 = Delete
    • 2 = Insert
    • 3 = Value of the line before an update
    • 4 = Value of the line after an Update
  • A mask showing the columns that have been modified based on their ordinal
    • 0x0F ⇒ column 1, 2, 3 and 4
    • 0x02 ⇒ column 2
    • 0x08 ⇒ column 4
  • The last columns correspond to the values ​​of the row in the table

In the example above, the 3 operations are:

  • Inserting the line
  • Update of the FirstName column from John to Jane
  • Update the hid column from 0x58 to 0x68

SQL Server provides some functions to query this table more easily:

  • cdc.fn_cdc_get_all_changes_<capture_instance>(from_lsn, to_lsn, row_filter_options) (doc): Returns all changes between 2 dates for all rows
  • cdc.fn_cdc_get_net_changes_<capture_instance>(from_lsn, to_lsn, row_filter_options) (doc): Returns only one change per row in the table. This change represents the final content of the line. For example if we make an insert and three update, the result will be the line resulting from these four operations.

These operations do not use the date format but LSNs. To obtain these we can use the following functions:

  • sys.fn_cdc_get_min_lsn('dbo_Employee')
  • sys.fn_cdc_get_max_lsn()
  • sys.fn_cdc_map_time_to_lsn('smallest greater than', GETDATE ())
  • sys.fn_cdc_map_time_to_lsn ('largest less than or equal', GETDATE ())

To illustrate here is an example listing the final changes to the table during the last 30 minutes:

SQL
SELECT *
FROM cdc.fn_cdc_get_net_changes_dbo_Employee(
        sys.fn_cdc_map_time_to_lsn('smallest greater than', DATEADD(minute, -30, GETDATE())),
        sys.fn_cdc_map_time_to_lsn('largest less than or equal', GETDATE()),
        'all')

Another allowing to list all the changes of the table since the beginning:

SQL
SELECT *
FROM cdc.fn_cdc_get_all_changes_dbo_Employee(
        sys.fn_cdc_get_min_lsn('dbo_Employee'),
        sys.fn_cdc_get_max_lsn(),
        N'all');

Capturing data can quickly blow up the size of the database. This is why SQL Server adds the cdc. <database>_cleanup job by default. This job deletes the contents of the tables regularly. This task is configured by default to run every day and keep the contents of the last 3 days. For this it uses the procedure sys.sp_cdc_cleanup_change_table.

We took a tour of the Change Data Capture feature. As we have seen, this feature keeps the history of changes made to the data without having to code anything. Indeed a few lines of configuration are enough to get the desired result.

Do you have a question or a suggestion about this post? Contact me!

Follow me:
Enjoy this blog?Buy Me A Coffee💖 Sponsor on GitHub