Quickly insert millions of rows in SQL Server in .NET

 
 
  • Gérald Barré

A few days ago, I needed to insert about 3 million rows into a table. This is not a frequent operation, but I do it about once a month, and I want it to be as fast as possible to avoid locking the table for too long. The data comes from several CSV files, processed using .NET code. This means I cannot use the BULK INSERT statement. Instead, I must use the SqlBulkCopy class from the .NET framework. Let's see how to use it with .NET objects!

Let's use this table:

SQL
CREATE TABLE [dbo].[Customer](
    [Id] [uniqueidentifier] NOT NULL,
    [FirstName] [nvarchar](50) NULL,
    [LastName] [nvarchar](50) NULL,
    [DateOfBirth] [datetime2](7) NULL
)

And the corresponding class:

C#
public class Customer
{
    public Guid Id { get; set; }
    public string FirstName { get; set; }
    public string LastName { get; set; }
    public DateTime DateOfBirth { get; set; }
}

For testing purposes, I've created a method that generates as many customer instances as needed:

C#
public static IEnumerable<Customer> Generate(int count)
{
    for (int i = 0; i < count; i++)
    {
        yield return new Customer
        {
            Id = Guid.NewGuid(),
            FirstName = "FirstName" + i,
            LastName = "LastName" + i,
            DateOfBirth = DateTime.UtcNow
        };
    }
}

At this point, you have an IEnumerable<Customer>. Looking at the available overloads of SqlBulkCopy.WriteToServerAsync, you can pass an array, a DataTable, or a DbDataReader. Only the last option supports streaming, which is critical when inserting millions of rows. You do not want to load everything into memory before sending the first row to SQL Server. The key question is: how do you convert an IEnumerable<T> into a DbDataReader?

DbDataReader exposes many methods: Read, GetOrdinal, IsDbNull, GetValue, GetByte, GetChar, GetString, and so on. In practice, you only need to implement the first four. Also, DbDataReader accesses values by ordinal (column index), whereas .NET uses property names. This means we need a mapping between property names and ordinals. Since property values are accessed frequently, using compiled expressions instead of reflection avoids unnecessary overhead.

C#
public class ObjectDataReader<T> : DbDataReader
{
    private IEnumerator<T> _iterator;
    private IDictionary<string, int> _propertyNameToOrdinal = new Dictionary<string, int>();
    private IDictionary<int, string> _ordinalToPropertyName = new Dictionary<int, string>();
    private Func<T, object>[] _getPropertyValueFuncs;

    public ObjectDataReader(IEnumerator<T> items)
    {
        _iterator = items ?? throw new ArgumentNullException(nameof(items));

        Initialize();
    }

    private void Initialize()
    {
        var properties = typeof(T).GetProperties();
        _getPropertyValueFuncs = new Func<T, object>[properties.Length];

        int ordinal = 0;
        foreach (var property in properties)
        {
            string propertyName = property.Name;
            _propertyNameToOrdinal.Add(propertyName, ordinal);
            _ordinalToPropertyName.Add(ordinal, propertyName);

            // Lambda: x => x.PropertyName
            var parameterExpression = Expression.Parameter(typeof(T), "x");
            var func = (Func<T, object>)Expression.Lambda(
                  Expression.Convert(
                      Expression.Property(parameterExpression, propertyName),
                      typeof(object)),
                      parameterExpression)
                  .Compile();
            _getPropertyValueFuncs[ordinal] = func;

            ordinal++;
        }
    }
}

With that in place, the remaining methods are straightforward to implement:

C#
    public override bool Read()
    {
        return _iterator.MoveNext();
    }

    public override int GetOrdinal(string name)
    {
        if (_propertyNameToOrdinal.TryGetValue(name, out var ordinal))
            return ordinal;

        return -1;
    }

    public override object GetValue(int ordinal)
    {
        var func = _getPropertyValueFuncs[ordinal];
        return func(_iterator.Current);
    }

    public override bool IsDBNull(int ordinal)
    {
        return GetValue(ordinal) == null;
    }

    // code omitted for brevity
    // https://gist.github.com/meziantou/174e2791dec966be837746750b87d069

That is all you need. Now you can test the code:

C#
using (var connection = new SqlConnection())
{
    connection.ConnectionString = "Server=(local);Database=Sample;Trusted_Connection=True;";
    await connection.OpenAsync(ct);

    using (var bulkCopy = new SqlBulkCopy(connection, SqlBulkCopyOptions.Default, null))
    {
        var customers = Customer.Generate(5_000_000); // digit separator (C#7)
        using (var enumerator = customers.GetEnumerator())
        using (var customerReader = new ObjectDataReader<Customer>(enumerator))
        {
            // Mapping PropertyName - ColumnName
            bulkCopy.DestinationTableName = "Customer";
            bulkCopy.ColumnMappings.Add(nameof(Customer.Id), "Id");
            bulkCopy.ColumnMappings.Add(nameof(Customer.FirstName), "FirstName");
            bulkCopy.ColumnMappings.Add(nameof(Customer.LastName), "LastName");
            bulkCopy.ColumnMappings.Add(nameof(Customer.DateOfBirth), "DateOfBirth");

            bulkCopy.EnableStreaming = true;
            bulkCopy.BatchSize = 10000;
            bulkCopy.NotifyAfter = 1000;
            bulkCopy.SqlRowsCopied += (sender, e) => Console.WriteLine("RowsCopied: " + e.RowsCopied);

            await bulkCopy.WriteToServerAsync(customerReader, ct);
        }
    }
}

#Conclusion

Using SqlBulkCopy, you can insert millions of rows in just a few seconds. With plain INSERT statements, the same operation would take several minutes. To keep memory usage low, stream the data to the database through a custom DbDataReader.

A sample code is available on Gist: https://gist.github.com/meziantou/174e2791dec966be837746750b87d069

Do you have a question or a suggestion about this post? Contact me!

Follow me:
Enjoy this blog?