Replace characters in a string using Vectorization
This post is part of the series 'SIMD'. Be sure to check out the rest of the blog posts of the series!
- Faster Guid comparisons using Vectors (SIMD) in .NET
- Finding the maximum value in an array using vectorization
- Replace characters in a string using Vectorization (this post)
I continue to look at code that uses Vectorization in the .NET libraries. In this post, we'll check the method ReplacePlusWithSpace
from the ASP.NET Core code. This method replaces +
with (space). This is useful to unescape URLs. This method uses
Vector128
and SSE2 instructions. Like in previous posts, the comments in the code are mine:
// source: https://github.com/dotnet/aspnetcore/blob/c65dac77cf6540c81860a42fff41eb11b9804367/src/Shared/QueryStringEnumerable.cs#L169
// Cache the delegate to avoid an instantiation each time, or a null check
// https://www.meziantou.net/performance-lambda-expressions-method-groups-and-delegate-caching.htm
private static readonly SpanAction<char, IntPtr> s_replacePlusWithSpace = ReplacePlusWithSpaceCore;
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static unsafe string ReplacePlusWithSpace(ReadOnlySpan<char> span)
{
// You cannot use the ReadOnlySpan<char> as a parameter of string.Create<T>().
// The following code doesn't compile:
// string.Create<ReadOnlySpan<char>>(10, span, (buffer, span) => { });
//
// The workaround is to create a pointer from the ReadOnlySpan and use the pointer
// in string.Create
fixed (char* ptr = &MemoryMarshal.GetReference(span))
{
return string.Create(span.Length, (IntPtr)ptr, s_replacePlusWithSpace);
}
}
private static unsafe void ReplacePlusWithSpaceCore(Span<char> buffer, IntPtr state)
{
// Convert the destination buffer to a pointer
fixed (char* ptr = &MemoryMarshal.GetReference(buffer))
{
// Vector only support numeric types, so you cannot use char.
// char and ushort are both 2 bytes long, so you can convert
// the pointer from (char*) to (ushort*)
var input = (ushort*)state.ToPointer();
var output = (ushort*)ptr;
var i = (nint)0;
var n = (nint)(uint)buffer.Length;
// Use Vector<128> to process 8 characters at a time
if (Sse41.IsSupported && n >= Vector128<ushort>.Count)
{
// Create a Vector128 instance with all elements initialized to '+'
var vecPlus = Vector128.Create((ushort)'+');
// Create a Vector128 instance with all elements initialized to '.'
var vecSpace = Vector128.Create((ushort)' ');
do
{
// Load 8 chars from the input string
// vec: ['a', 'a', '+', 'a', '+', 'a', 'a', 'a']
var vec = Sse2.LoadVector128(input + i);
// Compare the chars with '+'. The result contains 0x0000 when the char
// is not equals to '+', and 0xFFFF when it is equals to '+'.
// The goal is to create a mask which indicate the characters to replace
// with a space
//
// vec: [ 'a' , 'a' , '+' , 'a' , '+' , 'a' , 'a' , 'a' ]
// vecPlus: [ '+' , '+' , '+' , '+' , '+' , '+' , '+' , '+' ]
// mask: [0x0000, 0x0000, 0xFFFF, 0x0000, 0xFFFF, 0x0000, 0x0000, 0x0000]
var mask = Sse2.CompareEqual(vec, vecPlus);
// Replace chars where mask 1 with space
// vec: [ 'a' , 'a' , '+' , 'a' , '+' , 'a' , 'a' , 'a' ]
// vecSpace: [ ' ' , ' ' , ' ' , ' ' , ' ' , ' ' , ' ' , ' ' ]
// mask: [0x0000, 0x0000, 0xFFFF, 0x0000, 0xFFFF, 0x0000, 0x0000, 0x0000]
// res: [ 'a' , 'a' , ' ' , 'a' , ' ' , 'a' , 'a' , 'a' ]
var res = Sse41.BlendVariable(vec, vecSpace, mask);
// Store the res vector to the output buffer
Sse2.Store(output + i, res);
// Process the next 8 chars
i += Vector128<ushort>.Count;
} while (i <= n - Vector128<ushort>.Count);
}
// Processing the remaining characters (from 0 to 7 chars)
for (; i < n; ++i)
{
if (input[i] != '+')
{
output[i] = input[i];
}
else
{
output[i] = ' ';
}
}
}
}
#Replace specific instructions with Vector128 methods
The previous code uses direct instructions such as Sse41.BlendVariable
. This code works, but you need to check if the hardware supports them and you need to provide a fallback implementation when hardware is not supported. .NET provides methods that use SIMD instructions when possible and fall back to a software implementation. This way you don't need to handle the complexity by yourself.
The previous code can be rewritten using the Vector128
static methods:
if (n >= Vector128<ushort>.Count) // No need to check for hardware support
{
var vecPlus = Vector128.Create((ushort)'+');
var vecSpace = Vector128.Create((ushort)' ');
do
{
// Equivalent of Sse2.LoadVector128(input + i);
var vec = Vector128.Load(input + i);
// Equivalent of Sse2.CompareEqual(vec, vecPlus);
var mask = Vector128.Equals(vec, vecPlus);
// Equivalent of Sse41.BlendVariable(vec, vecSpace, mask);
var res = Vector128.ConditionalSelect(mask, vecSpace, vec);
// Equivalent of Sse2.Store(output + i, res);
res.Store(output + i);
i += Vector128<ushort>.Count;
} while (i <= n - Vector128<ushort>.Count);
}
for (; i < n; ++i)
{
if (input[i] != '+')
{
output[i] = input[i];
}
else
{
output[i] = ' ';
}
}
#Improve the method with AVX2
AVX2 provides instructions to process 16 characters at a time. The code is very similar to the SSE instruction set. Similar to Vector128
, you can use Vector256
to avoid handling the complexity of providing a software implementation. Let's see if this improves the performance.
private static unsafe void ReplacePlusWithSpaceCore(Span<char> buffer, IntPtr state)
{
fixed (char* ptr = &MemoryMarshal.GetReference(buffer))
{
var input = (ushort*)state.ToPointer();
var output = (ushort*)ptr;
var i = (nint)0;
var n = (nint)(uint)buffer.Length;
// Process 16 chars per loop
if (Vector256.IsHardwareAccelerated && n >= Vector256<ushort>.Count)
{
var vecPlus = Vector256.Create((ushort)'+');
var vecSpace = Vector256.Create((ushort)' ');
do
{
var vec = Vector256.Load(input + i);
var mask = Vector256.Equals(vec, vecPlus);
var res = Vector256.ConditionalSelect(mask, vecSpace, vec);
res.Store(output + i);
i += Vector256<ushort>.Count;
} while (i <= n - Vector256<ushort>.Count);
}
// Process 8 chars per loop
if (Vector128.IsHardwareAccelerated && n - i >= Vector128<ushort>.Count)
{
var vecPlus = Vector128.Create((ushort)'+');
var vecSpace = Vector128.Create((ushort)' ');
do
{
var vec = Vector128.Load(input + i);
var mask = Vector128.Equals(vec, vecPlus);
var res = Vector128.ConditionalSelect(mask, vecSpace, vec);
res.Store(output + i);
i += Vector128<ushort>.Count;
} while (i <= n - Vector128<ushort>.Count);
}
// Processing the remaining characters (from 0 to 7 chars)
for (; i < n; ++i)
{
if (input[i] != '+')
{
output[i] = input[i];
}
else
{
output[i] = ' ';
}
}
}
}
#Benchmark
Let's compare all the previous implementations, plus string.Replace
. While string.Replace
cannot be used in this case as the source is a ReadOnlySpan<char>
and not a string
, it gives a good indication of the performance of this method.Benchmark code
[ReturnValueValidator]
public class ReplacePlusWithSpaceBenchmark
{
[ParamsSource(nameof(ValueSource))]
public string Value { get; set; } = null!;
public IEnumerable<string> ValueSource
{
get
{
for (int i = 0; i < 128; i += 6)
{
yield return string.Create(i, state: (object?)null, (span, state) =>
{
for (var i = 0; i < span.Length; i++)
{
span[i] = i % 5 == 0 ? '+' : 'a';
}
});
}
}
}
[Benchmark()]
public string Basic() => BasicHelper.ReplacePlusWithSpace(Value);
[Benchmark]
public string StringReplace() => Value.Replace('+', ' ');
[Benchmark(Baseline = true)]
public string Current() => Vector128Helper_Sse.ReplacePlusWithSpace(Value);
[Benchmark]
public string Vector128() => Vector128Helper.ReplacePlusWithSpace(Value);
[Benchmark]
public string Vector256() => Vector256Helper.ReplacePlusWithSpace(Value);
}
public static class BasicHelper
{
private static readonly SpanAction<char, IntPtr> s_replacePlusWithSpace = ReplacePlusWithSpaceCore;
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static unsafe string ReplacePlusWithSpace(ReadOnlySpan<char> span)
{
fixed (char* ptr = &MemoryMarshal.GetReference(span))
{
return string.Create(span.Length, (IntPtr)ptr, s_replacePlusWithSpace);
}
}
private static unsafe void ReplacePlusWithSpaceCore(Span<char> buffer, IntPtr state)
{
fixed (char* ptr = &MemoryMarshal.GetReference(buffer))
{
var input = (ushort*)state.ToPointer();
var output = (ushort*)ptr;
var i = (nint)0;
var n = (nint)(uint)buffer.Length;
for (; i < n; ++i)
{
if (input[i] != '+')
{
output[i] = input[i];
}
else
{
output[i] = ' ';
}
}
}
}
}
public static class Vector128Helper_Sse
{
private static readonly SpanAction<char, IntPtr> s_replacePlusWithSpace = ReplacePlusWithSpaceCore;
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static unsafe string ReplacePlusWithSpace(ReadOnlySpan<char> span)
{
fixed (char* ptr = &MemoryMarshal.GetReference(span))
{
return string.Create(span.Length, (IntPtr)ptr, s_replacePlusWithSpace);
}
}
private static unsafe void ReplacePlusWithSpaceCore(Span<char> buffer, IntPtr state)
{
fixed (char* ptr = &MemoryMarshal.GetReference(buffer))
{
var input = (ushort*)state.ToPointer();
var output = (ushort*)ptr;
var i = (nint)0;
var n = (nint)(uint)buffer.Length;
if (Sse41.IsSupported && n >= Vector128<ushort>.Count)
{
var vecPlus = Vector128.Create((ushort)'+');
var vecSpace = Vector128.Create((ushort)' ');
do
{
var vec = Sse2.LoadVector128(input + i);
var mask = Sse2.CompareEqual(vec, vecPlus);
var res = Sse41.BlendVariable(vec, vecSpace, mask);
Sse2.Store(output + i, res);
i += Vector128<ushort>.Count;
} while (i <= n - Vector128<ushort>.Count);
}
for (; i < n; ++i)
{
if (input[i] != '+')
{
output[i] = input[i];
}
else
{
output[i] = ' ';
}
}
}
}
}
public static class Vector128Helper
{
private static readonly SpanAction<char, IntPtr> s_replacePlusWithSpace = ReplacePlusWithSpaceCore;
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static unsafe string ReplacePlusWithSpace(ReadOnlySpan<char> span)
{
fixed (char* ptr = &MemoryMarshal.GetReference(span))
{
return string.Create(span.Length, (IntPtr)ptr, s_replacePlusWithSpace);
}
}
public static unsafe void ReplacePlusWithSpaceCore(Span<char> buffer, IntPtr state)
{
fixed (char* ptr = &MemoryMarshal.GetReference(buffer))
{
var input = (ushort*)state.ToPointer();
var output = (ushort*)ptr;
var i = (nint)0;
var n = (nint)(uint)buffer.Length;
if (n >= Vector128<ushort>.Count)
{
var vecPlus = Vector128.Create((ushort)'+');
var vecSpace = Vector128.Create((ushort)' ');
do
{
var vec = Vector128.Load(input + i);
var mask = Vector128.Equals(vec, vecPlus);
var res = Vector128.ConditionalSelect(mask, vecSpace, vec);
res.Store(output + i);
i += Vector128<ushort>.Count;
} while (i <= n - Vector128<ushort>.Count);
}
for (; i < n; ++i)
{
if (input[i] != '+')
{
output[i] = input[i];
}
else
{
output[i] = ' ';
}
}
}
}
}
public static class Vector256Helper
{
private static readonly SpanAction<char, IntPtr> s_replacePlusWithSpace = ReplacePlusWithSpaceCore;
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static unsafe string ReplacePlusWithSpace(ReadOnlySpan<char> span)
{
fixed (char* ptr = &MemoryMarshal.GetReference(span))
{
return string.Create(span.Length, (IntPtr)ptr, s_replacePlusWithSpace);
}
}
private static unsafe void ReplacePlusWithSpaceCore(Span<char> buffer, IntPtr state)
{
fixed (char* ptr = &MemoryMarshal.GetReference(buffer))
{
var input = (ushort*)state.ToPointer();
var output = (ushort*)ptr;
var i = (nint)0;
var n = (nint)(uint)buffer.Length;
if (Vector256.IsHardwareAccelerated && n >= Vector256<ushort>.Count)
{
var vecPlus = Vector256.Create((ushort)'+');
var vecSpace = Vector256.Create((ushort)' ');
do
{
var vec = Vector256.Load(input + i);
var mask = Vector256.Equals(vec, vecPlus);
var res = Vector256.ConditionalSelect(mask, vecSpace, vec);
res.Store(output + i);
i += Vector256<ushort>.Count;
} while (i <= n - Vector256<ushort>.Count);
}
if (Vector128.IsHardwareAccelerated && n - i >= Vector128<ushort>.Count)
{
var vecPlus = Vector128.Create((ushort)'+');
var vecSpace = Vector128.Create((ushort)' ');
do
{
var vec = Vector128.Load(input + i);
var mask = Vector128.Equals(vec, vecPlus);
var res = Vector128.ConditionalSelect(mask, vecSpace, vec);
res.Store(output + i);
i += Vector128<ushort>.Count;
} while (i <= n - Vector128<ushort>.Count);
}
for (; i < n; ++i)
{
if (input[i] != '+')
{
output[i] = input[i];
}
else
{
output[i] = ' ';
}
}
}
}
}
BenchmarkDotNet=v0.13.1, OS=Windows 10.0.22621
AMD Ryzen 7 5800X, 1 CPU, 16 logical and 8 physical cores
.NET SDK=7.0.100-preview.5.22307.18
[Host] : .NET 7.0.0 (7.0.22.30112), X64 RyuJIT
RyuJitX64 : .NET 7.0.0 (7.0.22.31201), X64 RyuJIT
Job=RyuJitX64 Jit=RyuJit Platform=X64
Toolchain=.NET 7.0.100-preview.6.22316.8
Method | Value | Mean | Error | StdDev | Median | Ratio | RatioSD |
---|---|---|---|---|---|---|---|
NoVector | length: 0 | 1.356 ns | 0.0242 ns | 0.0215 ns | 1.360 ns | 0.97 | 0.04 |
StringReplace | 2.303 ns | 0.0543 ns | 0.0508 ns | 2.292 ns | 1.65 | 0.07 | |
Vector128_SSE | 1.395 ns | 0.0563 ns | 0.0526 ns | 1.398 ns | 1.00 | 0.00 | |
Vector128 | 1.347 ns | 0.0294 ns | 0.0275 ns | 1.346 ns | 0.97 | 0.04 | |
Vector256 | 1.344 ns | 0.0353 ns | 0.0330 ns | 1.338 ns | 0.96 | 0.04 | |
NoVector | length: 6 | 8.114 ns | 0.1864 ns | 0.1744 ns | 8.094 ns | 0.92 | 0.02 |
StringReplace | 7.713 ns | 0.1334 ns | 0.1114 ns | 7.726 ns | 0.87 | 0.01 | |
Vector128_SSE | 8.849 ns | 0.1527 ns | 0.1354 ns | 8.826 ns | 1.00 | 0.00 | |
Vector128 | 8.122 ns | 0.1614 ns | 0.1510 ns | 8.132 ns | 0.92 | 0.02 | |
Vector256 | 8.476 ns | 0.1463 ns | 0.1797 ns | 8.463 ns | 0.96 | 0.02 | |
NoVector | length: 12 | 10.453 ns | 0.1788 ns | 0.1673 ns | 10.450 ns | 1.43 | 0.03 |
StringReplace | 9.892 ns | 0.2260 ns | 0.3837 ns | 9.723 ns | 1.37 | 0.07 | |
Vector128_SSE | 7.319 ns | 0.1260 ns | 0.1052 ns | 7.349 ns | 1.00 | 0.00 | |
Vector128 | 7.219 ns | 0.1584 ns | 0.1404 ns | 7.185 ns | 0.99 | 0.03 | |
Vector256 | 7.931 ns | 0.2017 ns | 0.3141 ns | 7.896 ns | 1.08 | 0.06 | |
NoVector | length: 18 | 14.643 ns | 0.3154 ns | 0.2950 ns | 14.686 ns | 2.05 | 0.04 |
StringReplace | 7.482 ns | 0.1681 ns | 0.1573 ns | 7.546 ns | 1.05 | 0.03 | |
Vector128_SSE | 7.147 ns | 0.1176 ns | 0.1043 ns | 7.127 ns | 1.00 | 0.00 | |
Vector128 | 7.218 ns | 0.1173 ns | 0.1040 ns | 7.215 ns | 1.01 | 0.02 | |
Vector256 | 7.334 ns | 0.1345 ns | 0.1192 ns | 7.357 ns | 1.03 | 0.02 | |
NoVector | length: 24 | 17.528 ns | 0.3382 ns | 0.3164 ns | 17.579 ns | 2.47 | 0.04 |
StringReplace | 10.534 ns | 0.2072 ns | 0.1938 ns | 10.548 ns | 1.49 | 0.03 | |
Vector128_SSE | 7.095 ns | 0.1372 ns | 0.1216 ns | 7.112 ns | 1.00 | 0.00 | |
Vector128 | 7.110 ns | 0.1326 ns | 0.1176 ns | 7.117 ns | 1.00 | 0.02 | |
Vector256 | 7.111 ns | 0.1426 ns | 0.1264 ns | 7.079 ns | 1.00 | 0.03 | |
NoVector | length: 30 | 20.800 ns | 0.3545 ns | 0.3316 ns | 20.804 ns | 2.10 | 0.05 |
StringReplace | 13.720 ns | 0.2893 ns | 0.2564 ns | 13.766 ns | 1.38 | 0.03 | |
Vector128_SSE | 9.928 ns | 0.1773 ns | 0.1659 ns | 9.936 ns | 1.00 | 0.00 | |
Vector128 | 10.372 ns | 0.2506 ns | 0.5175 ns | 10.269 ns | 1.04 | 0.05 | |
Vector256 | 10.364 ns | 0.2515 ns | 0.4536 ns | 10.261 ns | 1.06 | 0.05 | |
NoVector | length: 36 | 23.607 ns | 0.5093 ns | 0.5002 ns | 23.813 ns | 2.51 | 0.06 |
StringReplace | 9.577 ns | 0.1544 ns | 0.1369 ns | 9.622 ns | 1.01 | 0.02 | |
Vector128_SSE | 9.428 ns | 0.2271 ns | 0.2125 ns | 9.533 ns | 1.00 | 0.00 | |
Vector128 | 9.192 ns | 0.1516 ns | 0.1344 ns | 9.246 ns | 0.97 | 0.02 | |
Vector256 | 9.470 ns | 0.1724 ns | 0.1613 ns | 9.471 ns | 1.00 | 0.03 | |
NoVector | length: 42 | 26.764 ns | 0.4799 ns | 0.4489 ns | 26.736 ns | 2.94 | 0.08 |
StringReplace | +aaaa(…)aaa+a [42] | 12.787 ns | 0.2885 ns | 0.2699 ns | 12.825 ns | 1.40 | 0.04 |
Vector128_SSE | +aaaa(…)aaa+a [42] | 9.119 ns | 0.2222 ns | 0.2079 ns | 9.062 ns | 1.00 | 0.00 |
Vector128 | +aaaa(…)aaa+a [42] | 8.866 ns | 0.2225 ns | 0.2185 ns | 8.886 ns | 0.97 | 0.03 |
Vector256 | +aaaa(…)aaa+a [42] | 9.017 ns | 0.1865 ns | 0.1653 ns | 9.013 ns | 0.99 | 0.02 |
NoVector | length: 48 | 29.749 ns | 0.6195 ns | 0.6629 ns | 29.779 ns | 3.29 | 0.06 |
StringReplace | +aaaa(…)aa+aa [48] | 9.144 ns | 0.2211 ns | 0.2716 ns | 9.142 ns | 1.00 | 0.03 |
Vector128_SSE | +aaaa(…)aa+aa [48] | 9.017 ns | 0.1193 ns | 0.1116 ns | 9.034 ns | 1.00 | 0.00 |
Vector128 | +aaaa(…)aa+aa [48] | 9.137 ns | 0.2230 ns | 0.2655 ns | 9.173 ns | 1.02 | 0.03 |
Vector256 | +aaaa(…)aa+aa [48] | 8.659 ns | 0.1974 ns | 0.1846 ns | 8.644 ns | 0.96 | 0.02 |
NoVector | length: 54 | 32.712 ns | 0.7051 ns | 0.7240 ns | 32.727 ns | 2.84 | 0.10 |
StringReplace | +aaaa(…)a+aaa [54] | 11.934 ns | 0.2786 ns | 0.2736 ns | 11.978 ns | 1.04 | 0.03 |
Vector128_SSE | +aaaa(…)a+aaa [54] | 11.478 ns | 0.2334 ns | 0.3347 ns | 11.429 ns | 1.00 | 0.00 |
Vector128 | +aaaa(…)a+aaa [54] | 11.415 ns | 0.2562 ns | 0.2517 ns | 11.439 ns | 0.99 | 0.04 |
Vector256 | +aaaa(…)a+aaa [54] | 11.469 ns | 0.1760 ns | 0.1470 ns | 11.463 ns | 1.00 | 0.02 |
NoVector | length: 60 | 36.040 ns | 0.2408 ns | 0.2134 ns | 36.058 ns | 3.24 | 0.09 |
StringReplace | +aaaa(…)+aaaa [60] | 15.078 ns | 0.3391 ns | 0.3769 ns | 15.224 ns | 1.36 | 0.05 |
Vector128_SSE | +aaaa(…)+aaaa [60] | 11.122 ns | 0.2348 ns | 0.2610 ns | 11.033 ns | 1.00 | 0.00 |
Vector128 | +aaaa(…)+aaaa [60] | 11.103 ns | 0.1660 ns | 0.1553 ns | 11.142 ns | 1.00 | 0.03 |
Vector256 | +aaaa(…)+aaaa [60] | 11.104 ns | 0.2501 ns | 0.2456 ns | 11.110 ns | 1.00 | 0.03 |
NoVector | length: 66 | 44.250 ns | 0.9163 ns | 1.4266 ns | 44.660 ns | 4.01 | 0.17 |
StringReplace | +aaaa(…)aaaa+ [66] | 11.166 ns | 0.2671 ns | 0.4462 ns | 11.161 ns | 1.03 | 0.04 |
Vector128_SSE | +aaaa(…)aaaa+ [66] | 11.017 ns | 0.2536 ns | 0.2920 ns | 10.975 ns | 1.00 | 0.00 |
Vector128 | +aaaa(…)aaaa+ [66] | 10.870 ns | 0.2487 ns | 0.2443 ns | 10.945 ns | 0.98 | 0.04 |
Vector256 | +aaaa(…)aaaa+ [66] | 10.577 ns | 0.2520 ns | 0.2902 ns | 10.585 ns | 0.96 | 0.04 |
NoVector | length: 72 | 49.050 ns | 1.0280 ns | 2.4630 ns | 48.317 ns | 4.44 | 0.31 |
StringReplace | +aaaa(…)aaa+a [72] | 14.954 ns | 0.3354 ns | 0.3993 ns | 14.937 ns | 1.30 | 0.04 |
Vector128_SSE | +aaaa(…)aaa+a [72] | 11.471 ns | 0.2678 ns | 0.2976 ns | 11.399 ns | 1.00 | 0.00 |
Vector128 | +aaaa(…)aaa+a [72] | 10.723 ns | 0.2582 ns | 0.6185 ns | 10.477 ns | 0.98 | 0.04 |
Vector256 | +aaaa(…)aaa+a [72] | 9.981 ns | 0.1095 ns | 0.0971 ns | 9.976 ns | 0.87 | 0.03 |
NoVector | length: 78 | 47.917 ns | 0.3747 ns | 0.3322 ns | 47.879 ns | 3.74 | 0.07 |
StringReplace | +aaaa(…)aa+aa [78] | 17.697 ns | 0.2877 ns | 0.2402 ns | 17.765 ns | 1.38 | 0.03 |
Vector128_SSE | +aaaa(…)aa+aa [78] | 12.785 ns | 0.2556 ns | 0.2391 ns | 12.803 ns | 1.00 | 0.00 |
Vector128 | +aaaa(…)aa+aa [78] | 12.841 ns | 0.2229 ns | 0.1976 ns | 12.776 ns | 1.00 | 0.03 |
Vector256 | +aaaa(…)aa+aa [78] | 12.558 ns | 0.0856 ns | 0.0715 ns | 12.577 ns | 0.98 | 0.02 |
NoVector | length: 84 | 50.057 ns | 0.5688 ns | 0.5042 ns | 50.061 ns | 3.93 | 0.08 |
StringReplace | +aaaa(…)a+aaa [84] | 13.632 ns | 0.3164 ns | 0.6533 ns | 13.516 ns | 1.09 | 0.06 |
Vector128_SSE | +aaaa(…)a+aaa [84] | 12.739 ns | 0.2543 ns | 0.2379 ns | 12.680 ns | 1.00 | 0.00 |
Vector128 | +aaaa(…)a+aaa [84] | 13.454 ns | 0.3139 ns | 0.6049 ns | 13.321 ns | 1.06 | 0.06 |
Vector256 | +aaaa(…)a+aaa [84] | 11.861 ns | 0.2198 ns | 0.2056 ns | 11.848 ns | 0.93 | 0.03 |
NoVector | length: 90 | 54.707 ns | 1.1174 ns | 0.9906 ns | 54.314 ns | 4.47 | 0.10 |
StringReplace | +aaaa(…)+aaaa [90] | 16.048 ns | 0.2490 ns | 0.2329 ns | 16.068 ns | 1.31 | 0.02 |
Vector128_SSE | +aaaa(…)+aaaa [90] | 12.218 ns | 0.2353 ns | 0.2201 ns | 12.077 ns | 1.00 | 0.00 |
Vector128 | +aaaa(…)+aaaa [90] | 12.531 ns | 0.1651 ns | 0.1289 ns | 12.543 ns | 1.02 | 0.02 |
Vector256 | +aaaa(…)+aaaa [90] | 12.388 ns | 0.2572 ns | 0.4153 ns | 12.268 ns | 1.02 | 0.03 |
NoVector | length: 96 | 57.387 ns | 1.1987 ns | 1.4270 ns | 57.483 ns | 4.64 | 0.12 |
StringReplace | +aaaa(…)aaaa+ [96] | 12.177 ns | 0.1435 ns | 0.1272 ns | 12.139 ns | 0.99 | 0.02 |
Vector128_SSE | +aaaa(…)aaaa+ [96] | 12.346 ns | 0.1922 ns | 0.1798 ns | 12.315 ns | 1.00 | 0.00 |
Vector128 | +aaaa(…)aaaa+ [96] | 12.476 ns | 0.2137 ns | 0.1999 ns | 12.439 ns | 1.01 | 0.02 |
Vector256 | +aaaa(…)aaaa+ [96] | 11.463 ns | 0.2065 ns | 0.1932 ns | 11.414 ns | 0.93 | 0.02 |
NoVector | length: 102 | 61.076 ns | 1.0319 ns | 0.9148 ns | 60.989 ns | 3.09 | 0.07 |
StringReplace | +aaa(…)aa+a [102] | 15.050 ns | 0.1900 ns | 0.1684 ns | 15.052 ns | 0.76 | 0.01 |
Vector128_SSE | +aaa(…)aa+a [102] | 19.770 ns | 0.4355 ns | 0.3861 ns | 19.693 ns | 1.00 | 0.00 |
Vector128 | +aaa(…)aa+a [102] | 14.953 ns | 0.2883 ns | 0.2697 ns | 15.006 ns | 0.76 | 0.02 |
Vector256 | +aaa(…)aa+a [102] | 13.722 ns | 0.2105 ns | 0.1969 ns | 13.753 ns | 0.69 | 0.02 |
NoVector | length: 108 | 63.958 ns | 1.2594 ns | 1.3476 ns | 63.442 ns | 4.54 | 0.13 |
StringReplace | +aaa(…)a+aa [108] | 18.385 ns | 0.3784 ns | 0.3539 ns | 18.310 ns | 1.30 | 0.03 |
Vector128_SSE | +aaa(…)a+aa [108] | 14.150 ns | 0.2219 ns | 0.1967 ns | 14.085 ns | 1.00 | 0.00 |
Vector128 | +aaa(…)a+aa [108] | 14.252 ns | 0.2817 ns | 0.2635 ns | 14.239 ns | 1.01 | 0.02 |
Vector256 | +aaa(…)a+aa [108] | 14.180 ns | 0.3240 ns | 0.3858 ns | 14.108 ns | 1.01 | 0.03 |
NoVector | length: 114 | 68.325 ns | 1.3614 ns | 1.2735 ns | 68.216 ns | 4.82 | 0.11 |
StringReplace | +aaa(…)+aaa [114] | 16.224 ns | 0.3027 ns | 0.2832 ns | 16.098 ns | 1.15 | 0.02 |
Vector128_SSE | +aaa(…)+aaa [114] | 14.169 ns | 0.2171 ns | 0.2031 ns | 14.163 ns | 1.00 | 0.00 |
Vector128 | +aaa(…)+aaa [114] | 13.950 ns | 0.3075 ns | 0.2726 ns | 13.925 ns | 0.99 | 0.02 |
Vector256 | +aaa(…)+aaa [114] | 13.143 ns | 0.2048 ns | 0.1915 ns | 13.131 ns | 0.93 | 0.02 |
NoVector | length: 120 | 69.259 ns | 1.1361 ns | 1.0627 ns | 69.383 ns | 5.12 | 0.07 |
StringReplace | +aaa(…)aaaa [120] | 16.914 ns | 0.3347 ns | 0.3131 ns | 16.787 ns | 1.25 | 0.02 |
Vector128_SSE | +aaa(…)aaaa [120] | 13.560 ns | 0.1446 ns | 0.1208 ns | 13.584 ns | 1.00 | 0.00 |
Vector128 | +aaa(…)aaaa [120] | 15.498 ns | 0.5173 ns | 1.4247 ns | 14.943 ns | 1.26 | 0.13 |
Vector256 | +aaa(…)aaaa [120] | 13.000 ns | 0.2672 ns | 0.2369 ns | 13.008 ns | 0.96 | 0.02 |
NoVector | length: 126 | 72.409 ns | 1.3006 ns | 1.1530 ns | 72.176 ns | 4.46 | 0.08 |
StringReplace | 20.305 ns | 0.2024 ns | 0.1690 ns | 20.280 ns | 1.25 | 0.01 | |
Vector128_SSE | 16.229 ns | 0.0853 ns | 0.0712 ns | 16.219 ns | 1.00 | 0.00 | |
Vector128 | 18.434 ns | 0.4324 ns | 1.2613 ns | 18.124 ns | 1.17 | 0.11 | |
Vector256 | 17.148 ns | 0.3881 ns | 0.7478 ns | 17.190 ns | 1.04 | 0.03 |
#Additional resources
Do you have a question or a suggestion about this post? Contact me!