Finding the maximum value in an array using vectorization

 
 
  • Gérald Barré

This post is part of the series 'SIMD'. Be sure to check out the rest of the blog posts of the series!

In the previous post, I described vectorization in .NET and how it can be useful to improve the performance of algorithms.

In this post, I'll explain how to use vectorization to find the maximum value of an array. The algorithm is already implemented in Enumerable.Max. You can find the original code on GitHub. I simplified the code to make it more readable and added comments to explain how it works.

C#
internal static int MaxIntegerSimd(ReadOnlySpan<int> span)
{
    int result;
    int index;

    if (Vector.IsHardwareAccelerated && span.Length >= Vector<int>.Count * 2)
    {
        // The span is at least two vectors long. Create a vector from the first N elements,
        // and then repeatedly compare that against the next vector from the span. At the end,
        // the resulting vector will contain the maximum values found, and we then need only
        // to find the max of those.
        var maxes = new Vector<int>(span);
        index = Vector<int>.Count; // 8 when AVX2 is supported
        do
        {
            // Vector.Max returns a new Vector with the maximum of each item:
            // result[0] = max(left[0], right[0])
            // result[1] = max(left[1], right[1])
            // ...
            // result[8] = max(left[8], right[8])
            //
            // Vector.Max is a single instruction (vpmaxsd), so it is much faster than
            // comparing each items manually in a for loop
            // https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_max_epi32&expand=3327&ig_expand=4537
            //
            // For instance,
            // left   [2, 5, 8, 4, 8, 8, 6, 2]
            // right  [8, 7, 1, 3, 9, 9, 0, 8]
            // result [8, 7, 8, 4, 9, 9, 6, 8]
            var right = new Vector<int>(span.Slice(index));
            maxes = Vector.Max(maxes, right);

            // Continue with the next 8 items of the span
            index += Vector<int>.Count;
        }
        while (index + Vector<int>.Count <= span.Length);

        // The "maxes" vector contains the 8 highest values.
        // We need to find the maximum values from the 8 values.
        result = maxes[0];
        for (int i = 1; i < Vector<int>.Count; i++)
        {
            if (maxes[i] > result)
            {
                result = maxes[i];
            }
        }
    }
    else
    {
        // The vectorization is not possible => use the basic implementation
        result = span[0];
        index = 1;
    }

    // Iterate through the remaining elements, comparing against the max.
    // This is needed when the number of items in the collection is not a
    // multiple of Vector<int>.Count.
    // The for loop is fast as the number of iteration is less than Vector<int>.Count.
    // note: the uint cast is a trick to help the JIT removing bound checks.
    for (int i = index; (uint)i < (uint)span.Length; i++)
    {
        if (span[i] > result)
            result = span[i];
    }

    return result;
}

#Benchmark

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.22616
AMD Ryzen 7 5800X, 1 CPU, 16 logical and 8 physical cores
.NET SDK=7.0.100-preview.4.22252.9
  [Host]     : .NET 7.0.0 (7.0.22.22904), X64 RyuJIT
  DefaultJob : .NET 7.0.0 (7.0.22.22904), X64 RyuJIT
MethodNMeanErrorStdDevRatio
Simple84.201 ns0.0600 ns0.0532 ns1.00
Vectorized85.631 ns0.0747 ns0.0699 ns1.34
Simple167.483 ns0.1118 ns0.1046 ns1.00
Vectorized164.126 ns0.0398 ns0.0372 ns0.55
Simple10049.090 ns0.9797 ns1.5252 ns1.00
Vectorized1009.330 ns0.1296 ns0.1212 ns0.19
Simple1000443.727 ns6.9752 ns6.5246 ns1.00
Vectorized100065.149 ns0.6617 ns0.6189 ns0.15
Simple100004,369.990 ns51.4527 ns45.6115 ns1.00
Vectorized10000572.290 ns5.7642 ns5.3918 ns0.13

#Additional resources

Do you have a question or a suggestion about this post? Contact me!

Follow me:
Enjoy this blog?Buy Me A Coffee💖 Sponsor on GitHub