This post is part of the series 'SIMD'. Be sure to check out the rest of the blog posts of the series!
I continue to look at code that uses Vectorization in the .NET libraries. In this post, we'll check the method ReplacePlusWithSpace from the ASP.NET Core code. This method replaces + with (space). This is useful to unescape URLs. This method uses Vector128 and SSE2 instructions. Like in previous posts, the comments in the code are mine:
C#
// source: https://github.com/dotnet/aspnetcore/blob/c65dac77cf6540c81860a42fff41eb11b9804367/src/Shared/QueryStringEnumerable.cs#L169
// Cache the delegate to avoid an instantiation each time, or a null check
// https://www.meziantou.net/performance-lambda-expressions-method-groups-and-delegate-caching.htm
private static readonly SpanAction<char, IntPtr> s_replacePlusWithSpace = ReplacePlusWithSpaceCore;
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static unsafe string ReplacePlusWithSpace(ReadOnlySpan<char> span)
{
// You cannot use the ReadOnlySpan<char> as a parameter of string.Create<T>().
// The following code doesn't compile:
// string.Create<ReadOnlySpan<char>>(10, span, (buffer, span) => { });
//
// The workaround is to create a pointer from the ReadOnlySpan and use the pointer
// in string.Create
fixed (char* ptr = &MemoryMarshal.GetReference(span))
{
return string.Create(span.Length, (IntPtr)ptr, s_replacePlusWithSpace);
}
}
private static unsafe void ReplacePlusWithSpaceCore(Span<char> buffer, IntPtr state)
{
// Convert the destination buffer to a pointer
fixed (char* ptr = &MemoryMarshal.GetReference(buffer))
{
// Vector only support numeric types, so you cannot use char.
// char and ushort are both 2 bytes long, so you can convert
// the pointer from (char*) to (ushort*)
var input = (ushort*)state.ToPointer();
var output = (ushort*)ptr;
var i = (nint)0;
var n = (nint)(uint)buffer.Length;
// Use Vector<128> to process 8 characters at a time
if (Sse41.IsSupported && n >= Vector128<ushort>.Count)
{
// Create a Vector128 instance with all elements initialized to '+'
var vecPlus = Vector128.Create((ushort)'+');
// Create a Vector128 instance with all elements initialized to '.'
var vecSpace = Vector128.Create((ushort)' ');
do
{
// Load 8 chars from the input string
// vec: ['a', 'a', '+', 'a', '+', 'a', 'a', 'a']
var vec = Sse2.LoadVector128(input + i);
// Compare the chars with '+'. The result contains 0x0000 when the char
// is not equals to '+', and 0xFFFF when it is equals to '+'.
// The goal is to create a mask which indicate the characters to replace
// with a space
//
// vec: [ 'a' , 'a' , '+' , 'a' , '+' , 'a' , 'a' , 'a' ]
// vecPlus: [ '+' , '+' , '+' , '+' , '+' , '+' , '+' , '+' ]
// mask: [0x0000, 0x0000, 0xFFFF, 0x0000, 0xFFFF, 0x0000, 0x0000, 0x0000]
var mask = Sse2.CompareEqual(vec, vecPlus);
// Replace chars where mask 1 with space
// vec: [ 'a' , 'a' , '+' , 'a' , '+' , 'a' , 'a' , 'a' ]
// vecSpace: [ ' ' , ' ' , ' ' , ' ' , ' ' , ' ' , ' ' , ' ' ]
// mask: [0x0000, 0x0000, 0xFFFF, 0x0000, 0xFFFF, 0x0000, 0x0000, 0x0000]
// res: [ 'a' , 'a' , ' ' , 'a' , ' ' , 'a' , 'a' , 'a' ]
var res = Sse41.BlendVariable(vec, vecSpace, mask);
// Store the res vector to the output buffer
Sse2.Store(output + i, res);
// Process the next 8 chars
i += Vector128<ushort>.Count;
} while (i <= n - Vector128<ushort>.Count);
}
// Processing the remaining characters (from 0 to 7 chars)
for (; i < n; ++i)
{
if (input[i] != '+')
{
output[i] = input[i];
}
else
{
output[i] = ' ';
}
}
}
}
#Replace specific instructions with Vector128 methods
The previous code uses direct instructions such as Sse41.BlendVariable. This code works, but you need to check if the hardware supports them and you need to provide a fallback implementation when hardware is not supported. .NET provides methods that use SIMD instructions when possible and fall back to a software implementation. This way you don't need to handle the complexity by yourself.
The previous code can be rewritten using the Vector128 static methods:
C#
if (n >= Vector128<ushort>.Count) // No need to check for hardware support
{
var vecPlus = Vector128.Create((ushort)'+');
var vecSpace = Vector128.Create((ushort)' ');
do
{
// Equivalent of Sse2.LoadVector128(input + i);
var vec = Vector128.Load(input + i);
// Equivalent of Sse2.CompareEqual(vec, vecPlus);
var mask = Vector128.Equals(vec, vecPlus);
// Equivalent of Sse41.BlendVariable(vec, vecSpace, mask);
var res = Vector128.ConditionalSelect(mask, vecSpace, vec);
// Equivalent of Sse2.Store(output + i, res);
res.Store(output + i);
i += Vector128<ushort>.Count;
} while (i <= n - Vector128<ushort>.Count);
}
for (; i < n; ++i)
{
if (input[i] != '+')
{
output[i] = input[i];
}
else
{
output[i] = ' ';
}
}
#Improve the method with AVX2
AVX2 provides instructions to process 16 characters at a time. The code is very similar to the SSE instruction set. Similar to Vector128, you can use Vector256 to avoid handling the complexity of providing a software implementation. Let's see if this improves the performance.
C#
private static unsafe void ReplacePlusWithSpaceCore(Span<char> buffer, IntPtr state)
{
fixed (char* ptr = &MemoryMarshal.GetReference(buffer))
{
var input = (ushort*)state.ToPointer();
var output = (ushort*)ptr;
var i = (nint)0;
var n = (nint)(uint)buffer.Length;
// Process 16 chars per loop
if (Vector256.IsHardwareAccelerated && n >= Vector256<ushort>.Count)
{
var vecPlus = Vector256.Create((ushort)'+');
var vecSpace = Vector256.Create((ushort)' ');
do
{
var vec = Vector256.Load(input + i);
var mask = Vector256.Equals(vec, vecPlus);
var res = Vector256.ConditionalSelect(mask, vecSpace, vec);
res.Store(output + i);
i += Vector256<ushort>.Count;
} while (i <= n - Vector256<ushort>.Count);
}
// Process 8 chars per loop
if (Vector128.IsHardwareAccelerated && n - i >= Vector128<ushort>.Count)
{
var vecPlus = Vector128.Create((ushort)'+');
var vecSpace = Vector128.Create((ushort)' ');
do
{
var vec = Vector128.Load(input + i);
var mask = Vector128.Equals(vec, vecPlus);
var res = Vector128.ConditionalSelect(mask, vecSpace, vec);
res.Store(output + i);
i += Vector128<ushort>.Count;
} while (i <= n - Vector128<ushort>.Count);
}
// Processing the remaining characters (from 0 to 7 chars)
for (; i < n; ++i)
{
if (input[i] != '+')
{
output[i] = input[i];
}
else
{
output[i] = ' ';
}
}
}
}
#Benchmark
Let's compare all the previous implementations, plus string.Replace. While string.Replace cannot be used in this case as the source is a ReadOnlySpan<char> and not a string, it gives a good indication of the performance of this method.Benchmark code
C#
[ReturnValueValidator]
public class ReplacePlusWithSpaceBenchmark
{
[ParamsSource(nameof(ValueSource))]
public string Value { get; set; } = null!;
public IEnumerable<string> ValueSource
{
get
{
for (int i = 0; i < 128; i += 6)
{
yield return string.Create(i, state: (object?)null, (span, state) =>
{
for (var i = 0; i < span.Length; i++)
{
span[i] = i % 5 == 0 ? '+' : 'a';
}
});
}
}
}
[Benchmark()]
public string Basic() => BasicHelper.ReplacePlusWithSpace(Value);
[Benchmark]
public string StringReplace() => Value.Replace('+', ' ');
[Benchmark(Baseline = true)]
public string Current() => Vector128Helper_Sse.ReplacePlusWithSpace(Value);
[Benchmark]
public string Vector128() => Vector128Helper.ReplacePlusWithSpace(Value);
[Benchmark]
public string Vector256() => Vector256Helper.ReplacePlusWithSpace(Value);
}
public static class BasicHelper
{
private static readonly SpanAction<char, IntPtr> s_replacePlusWithSpace = ReplacePlusWithSpaceCore;
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static unsafe string ReplacePlusWithSpace(ReadOnlySpan<char> span)
{
fixed (char* ptr = &MemoryMarshal.GetReference(span))
{
return string.Create(span.Length, (IntPtr)ptr, s_replacePlusWithSpace);
}
}
private static unsafe void ReplacePlusWithSpaceCore(Span<char> buffer, IntPtr state)
{
fixed (char* ptr = &MemoryMarshal.GetReference(buffer))
{
var input = (ushort*)state.ToPointer();
var output = (ushort*)ptr;
var i = (nint)0;
var n = (nint)(uint)buffer.Length;
for (; i < n; ++i)
{
if (input[i] != '+')
{
output[i] = input[i];
}
else
{
output[i] = ' ';
}
}
}
}
}
public static class Vector128Helper_Sse
{
private static readonly SpanAction<char, IntPtr> s_replacePlusWithSpace = ReplacePlusWithSpaceCore;
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static unsafe string ReplacePlusWithSpace(ReadOnlySpan<char> span)
{
fixed (char* ptr = &MemoryMarshal.GetReference(span))
{
return string.Create(span.Length, (IntPtr)ptr, s_replacePlusWithSpace);
}
}
private static unsafe void ReplacePlusWithSpaceCore(Span<char> buffer, IntPtr state)
{
fixed (char* ptr = &MemoryMarshal.GetReference(buffer))
{
var input = (ushort*)state.ToPointer();
var output = (ushort*)ptr;
var i = (nint)0;
var n = (nint)(uint)buffer.Length;
if (Sse41.IsSupported && n >= Vector128<ushort>.Count)
{
var vecPlus = Vector128.Create((ushort)'+');
var vecSpace = Vector128.Create((ushort)' ');
do
{
var vec = Sse2.LoadVector128(input + i);
var mask = Sse2.CompareEqual(vec, vecPlus);
var res = Sse41.BlendVariable(vec, vecSpace, mask);
Sse2.Store(output + i, res);
i += Vector128<ushort>.Count;
} while (i <= n - Vector128<ushort>.Count);
}
for (; i < n; ++i)
{
if (input[i] != '+')
{
output[i] = input[i];
}
else
{
output[i] = ' ';
}
}
}
}
}
public static class Vector128Helper
{
private static readonly SpanAction<char, IntPtr> s_replacePlusWithSpace = ReplacePlusWithSpaceCore;
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static unsafe string ReplacePlusWithSpace(ReadOnlySpan<char> span)
{
fixed (char* ptr = &MemoryMarshal.GetReference(span))
{
return string.Create(span.Length, (IntPtr)ptr, s_replacePlusWithSpace);
}
}
public static unsafe void ReplacePlusWithSpaceCore(Span<char> buffer, IntPtr state)
{
fixed (char* ptr = &MemoryMarshal.GetReference(buffer))
{
var input = (ushort*)state.ToPointer();
var output = (ushort*)ptr;
var i = (nint)0;
var n = (nint)(uint)buffer.Length;
if (n >= Vector128<ushort>.Count)
{
var vecPlus = Vector128.Create((ushort)'+');
var vecSpace = Vector128.Create((ushort)' ');
do
{
var vec = Vector128.Load(input + i);
var mask = Vector128.Equals(vec, vecPlus);
var res = Vector128.ConditionalSelect(mask, vecSpace, vec);
res.Store(output + i);
i += Vector128<ushort>.Count;
} while (i <= n - Vector128<ushort>.Count);
}
for (; i < n; ++i)
{
if (input[i] != '+')
{
output[i] = input[i];
}
else
{
output[i] = ' ';
}
}
}
}
}
public static class Vector256Helper
{
private static readonly SpanAction<char, IntPtr> s_replacePlusWithSpace = ReplacePlusWithSpaceCore;
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static unsafe string ReplacePlusWithSpace(ReadOnlySpan<char> span)
{
fixed (char* ptr = &MemoryMarshal.GetReference(span))
{
return string.Create(span.Length, (IntPtr)ptr, s_replacePlusWithSpace);
}
}
private static unsafe void ReplacePlusWithSpaceCore(Span<char> buffer, IntPtr state)
{
fixed (char* ptr = &MemoryMarshal.GetReference(buffer))
{
var input = (ushort*)state.ToPointer();
var output = (ushort*)ptr;
var i = (nint)0;
var n = (nint)(uint)buffer.Length;
if (Vector256.IsHardwareAccelerated && n >= Vector256<ushort>.Count)
{
var vecPlus = Vector256.Create((ushort)'+');
var vecSpace = Vector256.Create((ushort)' ');
do
{
var vec = Vector256.Load(input + i);
var mask = Vector256.Equals(vec, vecPlus);
var res = Vector256.ConditionalSelect(mask, vecSpace, vec);
res.Store(output + i);
i += Vector256<ushort>.Count;
} while (i <= n - Vector256<ushort>.Count);
}
if (Vector128.IsHardwareAccelerated && n - i >= Vector128<ushort>.Count)
{
var vecPlus = Vector128.Create((ushort)'+');
var vecSpace = Vector128.Create((ushort)' ');
do
{
var vec = Vector128.Load(input + i);
var mask = Vector128.Equals(vec, vecPlus);
var res = Vector128.ConditionalSelect(mask, vecSpace, vec);
res.Store(output + i);
i += Vector128<ushort>.Count;
} while (i <= n - Vector128<ushort>.Count);
}
for (; i < n; ++i)
{
if (input[i] != '+')
{
output[i] = input[i];
}
else
{
output[i] = ' ';
}
}
}
}
}
INI
BenchmarkDotNet=v0.13.1, OS=Windows 10.0.22621
AMD Ryzen 7 5800X, 1 CPU, 16 logical and 8 physical cores
.NET SDK=7.0.100-preview.5.22307.18
[Host] : .NET 7.0.0 (7.0.22.30112), X64 RyuJIT
RyuJitX64 : .NET 7.0.0 (7.0.22.31201), X64 RyuJIT
Job=RyuJitX64 Jit=RyuJit Platform=X64
Toolchain=.NET 7.0.100-preview.6.22316.8
| Method | Value | Mean | Error | StdDev | Median | Ratio | RatioSD |
|---|
| NoVector | length: 0 | 1.356 ns | 0.0242 ns | 0.0215 ns | 1.360 ns | 0.97 | 0.04 |
| StringReplace | | 2.303 ns | 0.0543 ns | 0.0508 ns | 2.292 ns | 1.65 | 0.07 |
| Vector128_SSE | | 1.395 ns | 0.0563 ns | 0.0526 ns | 1.398 ns | 1.00 | 0.00 |
| Vector128 | | 1.347 ns | 0.0294 ns | 0.0275 ns | 1.346 ns | 0.97 | 0.04 |
| Vector256 | | 1.344 ns | 0.0353 ns | 0.0330 ns | 1.338 ns | 0.96 | 0.04 |
| | | | | | | |
| NoVector | length: 6 | 8.114 ns | 0.1864 ns | 0.1744 ns | 8.094 ns | 0.92 | 0.02 |
| StringReplace | | 7.713 ns | 0.1334 ns | 0.1114 ns | 7.726 ns | 0.87 | 0.01 |
| Vector128_SSE | | 8.849 ns | 0.1527 ns | 0.1354 ns | 8.826 ns | 1.00 | 0.00 |
| Vector128 | | 8.122 ns | 0.1614 ns | 0.1510 ns | 8.132 ns | 0.92 | 0.02 |
| Vector256 | | 8.476 ns | 0.1463 ns | 0.1797 ns | 8.463 ns | 0.96 | 0.02 |
| | | | | | | |
| NoVector | length: 12 | 10.453 ns | 0.1788 ns | 0.1673 ns | 10.450 ns | 1.43 | 0.03 |
| StringReplace | | 9.892 ns | 0.2260 ns | 0.3837 ns | 9.723 ns | 1.37 | 0.07 |
| Vector128_SSE | | 7.319 ns | 0.1260 ns | 0.1052 ns | 7.349 ns | 1.00 | 0.00 |
| Vector128 | | 7.219 ns | 0.1584 ns | 0.1404 ns | 7.185 ns | 0.99 | 0.03 |
| Vector256 | | 7.931 ns | 0.2017 ns | 0.3141 ns | 7.896 ns | 1.08 | 0.06 |
| | | | | | | |
| NoVector | length: 18 | 14.643 ns | 0.3154 ns | 0.2950 ns | 14.686 ns | 2.05 | 0.04 |
| StringReplace | | 7.482 ns | 0.1681 ns | 0.1573 ns | 7.546 ns | 1.05 | 0.03 |
| Vector128_SSE | | 7.147 ns | 0.1176 ns | 0.1043 ns | 7.127 ns | 1.00 | 0.00 |
| Vector128 | | 7.218 ns | 0.1173 ns | 0.1040 ns | 7.215 ns | 1.01 | 0.02 |
| Vector256 | | 7.334 ns | 0.1345 ns | 0.1192 ns | 7.357 ns | 1.03 | 0.02 |
| | | | | | | |
| NoVector | length: 24 | 17.528 ns | 0.3382 ns | 0.3164 ns | 17.579 ns | 2.47 | 0.04 |
| StringReplace | | 10.534 ns | 0.2072 ns | 0.1938 ns | 10.548 ns | 1.49 | 0.03 |
| Vector128_SSE | | 7.095 ns | 0.1372 ns | 0.1216 ns | 7.112 ns | 1.00 | 0.00 |
| Vector128 | | 7.110 ns | 0.1326 ns | 0.1176 ns | 7.117 ns | 1.00 | 0.02 |
| Vector256 | | 7.111 ns | 0.1426 ns | 0.1264 ns | 7.079 ns | 1.00 | 0.03 |
| | | | | | | |
| NoVector | length: 30 | 20.800 ns | 0.3545 ns | 0.3316 ns | 20.804 ns | 2.10 | 0.05 |
| StringReplace | | 13.720 ns | 0.2893 ns | 0.2564 ns | 13.766 ns | 1.38 | 0.03 |
| Vector128_SSE | | 9.928 ns | 0.1773 ns | 0.1659 ns | 9.936 ns | 1.00 | 0.00 |
| Vector128 | | 10.372 ns | 0.2506 ns | 0.5175 ns | 10.269 ns | 1.04 | 0.05 |
| Vector256 | | 10.364 ns | 0.2515 ns | 0.4536 ns | 10.261 ns | 1.06 | 0.05 |
| | | | | | | |
| NoVector | length: 36 | 23.607 ns | 0.5093 ns | 0.5002 ns | 23.813 ns | 2.51 | 0.06 |
| StringReplace | | 9.577 ns | 0.1544 ns | 0.1369 ns | 9.622 ns | 1.01 | 0.02 |
| Vector128_SSE | | 9.428 ns | 0.2271 ns | 0.2125 ns | 9.533 ns | 1.00 | 0.00 |
| Vector128 | | 9.192 ns | 0.1516 ns | 0.1344 ns | 9.246 ns | 0.97 | 0.02 |
| Vector256 | | 9.470 ns | 0.1724 ns | 0.1613 ns | 9.471 ns | 1.00 | 0.03 |
| | | | | | | |
| NoVector | length: 42 | 26.764 ns | 0.4799 ns | 0.4489 ns | 26.736 ns | 2.94 | 0.08 |
| StringReplace | +aaaa(…)aaa+a [42] | 12.787 ns | 0.2885 ns | 0.2699 ns | 12.825 ns | 1.40 | 0.04 |
| Vector128_SSE | +aaaa(…)aaa+a [42] | 9.119 ns | 0.2222 ns | 0.2079 ns | 9.062 ns | 1.00 | 0.00 |
| Vector128 | +aaaa(…)aaa+a [42] | 8.866 ns | 0.2225 ns | 0.2185 ns | 8.886 ns | 0.97 | 0.03 |
| Vector256 | +aaaa(…)aaa+a [42] | 9.017 ns | 0.1865 ns | 0.1653 ns | 9.013 ns | 0.99 | 0.02 |
| | | | | | | |
| NoVector | length: 48 | 29.749 ns | 0.6195 ns | 0.6629 ns | 29.779 ns | 3.29 | 0.06 |
| StringReplace | +aaaa(…)aa+aa [48] | 9.144 ns | 0.2211 ns | 0.2716 ns | 9.142 ns | 1.00 | 0.03 |
| Vector128_SSE | +aaaa(…)aa+aa [48] | 9.017 ns | 0.1193 ns | 0.1116 ns | 9.034 ns | 1.00 | 0.00 |
| Vector128 | +aaaa(…)aa+aa [48] | 9.137 ns | 0.2230 ns | 0.2655 ns | 9.173 ns | 1.02 | 0.03 |
| Vector256 | +aaaa(…)aa+aa [48] | 8.659 ns | 0.1974 ns | 0.1846 ns | 8.644 ns | 0.96 | 0.02 |
| | | | | | | |
| NoVector | length: 54 | 32.712 ns | 0.7051 ns | 0.7240 ns | 32.727 ns | 2.84 | 0.10 |
| StringReplace | +aaaa(…)a+aaa [54] | 11.934 ns | 0.2786 ns | 0.2736 ns | 11.978 ns | 1.04 | 0.03 |
| Vector128_SSE | +aaaa(…)a+aaa [54] | 11.478 ns | 0.2334 ns | 0.3347 ns | 11.429 ns | 1.00 | 0.00 |
| Vector128 | +aaaa(…)a+aaa [54] | 11.415 ns | 0.2562 ns | 0.2517 ns | 11.439 ns | 0.99 | 0.04 |
| Vector256 | +aaaa(…)a+aaa [54] | 11.469 ns | 0.1760 ns | 0.1470 ns | 11.463 ns | 1.00 | 0.02 |
| | | | | | | |
| NoVector | length: 60 | 36.040 ns | 0.2408 ns | 0.2134 ns | 36.058 ns | 3.24 | 0.09 |
| StringReplace | +aaaa(…)+aaaa [60] | 15.078 ns | 0.3391 ns | 0.3769 ns | 15.224 ns | 1.36 | 0.05 |
| Vector128_SSE | +aaaa(…)+aaaa [60] | 11.122 ns | 0.2348 ns | 0.2610 ns | 11.033 ns | 1.00 | 0.00 |
| Vector128 | +aaaa(…)+aaaa [60] | 11.103 ns | 0.1660 ns | 0.1553 ns | 11.142 ns | 1.00 | 0.03 |
| Vector256 | +aaaa(…)+aaaa [60] | 11.104 ns | 0.2501 ns | 0.2456 ns | 11.110 ns | 1.00 | 0.03 |
| | | | | | | |
| NoVector | length: 66 | 44.250 ns | 0.9163 ns | 1.4266 ns | 44.660 ns | 4.01 | 0.17 |
| StringReplace | +aaaa(…)aaaa+ [66] | 11.166 ns | 0.2671 ns | 0.4462 ns | 11.161 ns | 1.03 | 0.04 |
| Vector128_SSE | +aaaa(…)aaaa+ [66] | 11.017 ns | 0.2536 ns | 0.2920 ns | 10.975 ns | 1.00 | 0.00 |
| Vector128 | +aaaa(…)aaaa+ [66] | 10.870 ns | 0.2487 ns | 0.2443 ns | 10.945 ns | 0.98 | 0.04 |
| Vector256 | +aaaa(…)aaaa+ [66] | 10.577 ns | 0.2520 ns | 0.2902 ns | 10.585 ns | 0.96 | 0.04 |
| | | | | | | |
| NoVector | length: 72 | 49.050 ns | 1.0280 ns | 2.4630 ns | 48.317 ns | 4.44 | 0.31 |
| StringReplace | +aaaa(…)aaa+a [72] | 14.954 ns | 0.3354 ns | 0.3993 ns | 14.937 ns | 1.30 | 0.04 |
| Vector128_SSE | +aaaa(…)aaa+a [72] | 11.471 ns | 0.2678 ns | 0.2976 ns | 11.399 ns | 1.00 | 0.00 |
| Vector128 | +aaaa(…)aaa+a [72] | 10.723 ns | 0.2582 ns | 0.6185 ns | 10.477 ns | 0.98 | 0.04 |
| Vector256 | +aaaa(…)aaa+a [72] | 9.981 ns | 0.1095 ns | 0.0971 ns | 9.976 ns | 0.87 | 0.03 |
| | | | | | | |
| NoVector | length: 78 | 47.917 ns | 0.3747 ns | 0.3322 ns | 47.879 ns | 3.74 | 0.07 |
| StringReplace | +aaaa(…)aa+aa [78] | 17.697 ns | 0.2877 ns | 0.2402 ns | 17.765 ns | 1.38 | 0.03 |
| Vector128_SSE | +aaaa(…)aa+aa [78] | 12.785 ns | 0.2556 ns | 0.2391 ns | 12.803 ns | 1.00 | 0.00 |
| Vector128 | +aaaa(…)aa+aa [78] | 12.841 ns | 0.2229 ns | 0.1976 ns | 12.776 ns | 1.00 | 0.03 |
| Vector256 | +aaaa(…)aa+aa [78] | 12.558 ns | 0.0856 ns | 0.0715 ns | 12.577 ns | 0.98 | 0.02 |
| | | | | | | |
| NoVector | length: 84 | 50.057 ns | 0.5688 ns | 0.5042 ns | 50.061 ns | 3.93 | 0.08 |
| StringReplace | +aaaa(…)a+aaa [84] | 13.632 ns | 0.3164 ns | 0.6533 ns | 13.516 ns | 1.09 | 0.06 |
| Vector128_SSE | +aaaa(…)a+aaa [84] | 12.739 ns | 0.2543 ns | 0.2379 ns | 12.680 ns | 1.00 | 0.00 |
| Vector128 | +aaaa(…)a+aaa [84] | 13.454 ns | 0.3139 ns | 0.6049 ns | 13.321 ns | 1.06 | 0.06 |
| Vector256 | +aaaa(…)a+aaa [84] | 11.861 ns | 0.2198 ns | 0.2056 ns | 11.848 ns | 0.93 | 0.03 |
| | | | | | | |
| NoVector | length: 90 | 54.707 ns | 1.1174 ns | 0.9906 ns | 54.314 ns | 4.47 | 0.10 |
| StringReplace | +aaaa(…)+aaaa [90] | 16.048 ns | 0.2490 ns | 0.2329 ns | 16.068 ns | 1.31 | 0.02 |
| Vector128_SSE | +aaaa(…)+aaaa [90] | 12.218 ns | 0.2353 ns | 0.2201 ns | 12.077 ns | 1.00 | 0.00 |
| Vector128 | +aaaa(…)+aaaa [90] | 12.531 ns | 0.1651 ns | 0.1289 ns | 12.543 ns | 1.02 | 0.02 |
| Vector256 | +aaaa(…)+aaaa [90] | 12.388 ns | 0.2572 ns | 0.4153 ns | 12.268 ns | 1.02 | 0.03 |
| | | | | | | |
| NoVector | length: 96 | 57.387 ns | 1.1987 ns | 1.4270 ns | 57.483 ns | 4.64 | 0.12 |
| StringReplace | +aaaa(…)aaaa+ [96] | 12.177 ns | 0.1435 ns | 0.1272 ns | 12.139 ns | 0.99 | 0.02 |
| Vector128_SSE | +aaaa(…)aaaa+ [96] | 12.346 ns | 0.1922 ns | 0.1798 ns | 12.315 ns | 1.00 | 0.00 |
| Vector128 | +aaaa(…)aaaa+ [96] | 12.476 ns | 0.2137 ns | 0.1999 ns | 12.439 ns | 1.01 | 0.02 |
| Vector256 | +aaaa(…)aaaa+ [96] | 11.463 ns | 0.2065 ns | 0.1932 ns | 11.414 ns | 0.93 | 0.02 |
| | | | | | | |
| NoVector | length: 102 | 61.076 ns | 1.0319 ns | 0.9148 ns | 60.989 ns | 3.09 | 0.07 |
| StringReplace | +aaa(…)aa+a [102] | 15.050 ns | 0.1900 ns | 0.1684 ns | 15.052 ns | 0.76 | 0.01 |
| Vector128_SSE | +aaa(…)aa+a [102] | 19.770 ns | 0.4355 ns | 0.3861 ns | 19.693 ns | 1.00 | 0.00 |
| Vector128 | +aaa(…)aa+a [102] | 14.953 ns | 0.2883 ns | 0.2697 ns | 15.006 ns | 0.76 | 0.02 |
| Vector256 | +aaa(…)aa+a [102] | 13.722 ns | 0.2105 ns | 0.1969 ns | 13.753 ns | 0.69 | 0.02 |
| | | | | | | |
| NoVector | length: 108 | 63.958 ns | 1.2594 ns | 1.3476 ns | 63.442 ns | 4.54 | 0.13 |
| StringReplace | +aaa(…)a+aa [108] | 18.385 ns | 0.3784 ns | 0.3539 ns | 18.310 ns | 1.30 | 0.03 |
| Vector128_SSE | +aaa(…)a+aa [108] | 14.150 ns | 0.2219 ns | 0.1967 ns | 14.085 ns | 1.00 | 0.00 |
| Vector128 | +aaa(…)a+aa [108] | 14.252 ns | 0.2817 ns | 0.2635 ns | 14.239 ns | 1.01 | 0.02 |
| Vector256 | +aaa(…)a+aa [108] | 14.180 ns | 0.3240 ns | 0.3858 ns | 14.108 ns | 1.01 | 0.03 |
| | | | | | | |
| NoVector | length: 114 | 68.325 ns | 1.3614 ns | 1.2735 ns | 68.216 ns | 4.82 | 0.11 |
| StringReplace | +aaa(…)+aaa [114] | 16.224 ns | 0.3027 ns | 0.2832 ns | 16.098 ns | 1.15 | 0.02 |
| Vector128_SSE | +aaa(…)+aaa [114] | 14.169 ns | 0.2171 ns | 0.2031 ns | 14.163 ns | 1.00 | 0.00 |
| Vector128 | +aaa(…)+aaa [114] | 13.950 ns | 0.3075 ns | 0.2726 ns | 13.925 ns | 0.99 | 0.02 |
| Vector256 | +aaa(…)+aaa [114] | 13.143 ns | 0.2048 ns | 0.1915 ns | 13.131 ns | 0.93 | 0.02 |
| | | | | | | |
| NoVector | length: 120 | 69.259 ns | 1.1361 ns | 1.0627 ns | 69.383 ns | 5.12 | 0.07 |
| StringReplace | +aaa(…)aaaa [120] | 16.914 ns | 0.3347 ns | 0.3131 ns | 16.787 ns | 1.25 | 0.02 |
| Vector128_SSE | +aaa(…)aaaa [120] | 13.560 ns | 0.1446 ns | 0.1208 ns | 13.584 ns | 1.00 | 0.00 |
| Vector128 | +aaa(…)aaaa [120] | 15.498 ns | 0.5173 ns | 1.4247 ns | 14.943 ns | 1.26 | 0.13 |
| Vector256 | +aaa(…)aaaa [120] | 13.000 ns | 0.2672 ns | 0.2369 ns | 13.008 ns | 0.96 | 0.02 |
| | | | | | | |
| NoVector | length: 126 | 72.409 ns | 1.3006 ns | 1.1530 ns | 72.176 ns | 4.46 | 0.08 |
| StringReplace | | 20.305 ns | 0.2024 ns | 0.1690 ns | 20.280 ns | 1.25 | 0.01 |
| Vector128_SSE | | 16.229 ns | 0.0853 ns | 0.0712 ns | 16.219 ns | 1.00 | 0.00 |
| Vector128 | | 18.434 ns | 0.4324 ns | 1.2613 ns | 18.124 ns | 1.17 | 0.11 |
| Vector256 | | 17.148 ns | 0.3881 ns | 0.7478 ns | 17.190 ns | 1.04 | 0.03 |
#Additional resources
Do you have a question or a suggestion about this post? Contact me!