This post is part of the series 'Strings in .NET'. Be sure to check out the rest of the blog posts of the series!
In a previous post, I explained why \d is different from [0-9]. In this post, I'll explain why the regex [a-zA-Z] differs from [a-z] when using the IgnoreCase option.
C#
var regex1 = new Regex("^[a-zA-Z]+$");
var regex2 = new Regex("^[a-z]+$", RegexOptions.IgnoreCase | RegexOptions.CultureInvariant);
Console.WriteLine(regex1.IsMatch("Test")); // true
Console.WriteLine(regex2.IsMatch("Test")); // true
However, you get different results if you use the Kelvin sign:
C#
// Kelvin Sign (U+212A)
Console.WriteLine(regex1.IsMatch("K")); // false
Console.WriteLine(regex2.IsMatch("K")); // true
When the RegexOptions.IgnoreCase option is specified, comparisons between the input and the pattern are case-insensitive. To support this, the Regex engine uses case mappings. A match occurs if two characters are considered equivalent according to the casing rules (e.g., one is the lowercase version of the other).
In this case, char.ToLowerInvariant('K') (Kelvin Sign) returns 'k' (Latin Small Letter K). Therefore, when using the IgnoreCase option, [a-z] matches the Kelvin sign because 'k' is in the range a-z. However, the Kelvin Sign itself is not part of the [a-zA-Z] set, so the regex [a-zA-Z] does not match it.
To conclude, the following regular expressions are equivalent:
C#
new Regex("^[a-z]+$", RegexOptions.IgnoreCase | RegexOptions.CultureInvariant);
new Regex("^[A-Za-z\u212A]+$");
If you are curious about how .NET knows which case mappings to use, you can read the code of the GenerateRegexCasingTable tool on GitHub.
#Additional resources
Do you have a question or a suggestion about this post? Contact me!