This post is part of the series 'Strings in .NET'. Be sure to check out the rest of the blog posts of the series!
According to the .NET documentation, \d matches any decimal digit. However, the definition of a "decimal digit" depends on the provided options:
- Without
RegexOptions.ECMAScript (default): \d is equivalent to \p{Nd}, i.e., any character from the Unicode category "Decimal digit" - With
RegexOptions.ECMAScript: \d is equivalent to [0-9]
The Unicode category "Decimal digit" contains characters such as 0, 1, or 2, but also characters from other scripts such as ٣, ٧, ൩, or ໓. The full list contains 610 characters:
0x0030-0x0039, // ASCII
0x0660-0x0669, // Arabic-Indic
0x06f0-0x06f9, // Eastern Arabic-Indic
0x0966-0x096f, // Devanagari
0x09e6-0x09ef, // Bengali
0x0a66-0x0a6f, // Gurmukhi
0x0ae6-0x0aef, // Gujarati
0x0b66-0x0b6f, // Oriya
0x0c66-0x0c6f, // Telugu
0x0ce6-0x0cef, // Kannada
0x0d66-0x0d6f, // Malayalam
0x0e50-0x0e59, // Thai
0x0ed0-0x0ed9, // Lao
0x0f20-0x0f29, // Tibetan
0x1040-0x1049, // Myanmar
0x17e0-0x17e9, // Khmer
0x1810-0x1819, // Mongolian
0x1946-0x194f, // Limbu
0xff10-0xff19, // Fullwidth
0x1d7ce-0x1d7d7 // Math Bold
0x1d7d8-0x1d7e1 // Math Double
0x1d7e2-0x1d7eb // Math SansSerif
0x1d7ec-0x1d7f5 // Math SS Bold
0x1d7f6-0x1d7ff // Math Monosp
Here are some examples to demonstrate the differences:
C#
// \u0030 - \u0039
Regex.IsMatch("0123456789", "\\d{10}"); // True
Regex.IsMatch("0123456789", "[0-9]{10}"); // True
// DEVANAGARI DIGIT: \u0966 - \u096F
Regex.IsMatch("०१२३४५६७८९", "\\d{10}"); // True
Regex.IsMatch("०१२३४५६७८९", "[0-9]{10}"); // False
// RegexOptions.ECMAScript
Regex.IsMatch("0123456789", "\\d{10}", RegexOptions.ECMAScript); // True
Regex.IsMatch("०१२३४५६७८९", "\\d{10}", RegexOptions.ECMAScript); // False
The next time you want to match a digit in a regex, ensure you know whether you want to match [0-9] or \p{Nd}.
#Additional resources
Do you have a question or a suggestion about this post? Contact me!