Regex - Deny of Service (ReDoS)

 
 
  • Gérald Barré

This post is part of the series 'Vulnerabilities'. Be sure to check out the rest of the blog posts of the series!

.NET Regexes are very powerful. You use very complicated patterns to match lots of things. But there also comes with potential problems (source xkcd):

Some patterns may require more resources to match some strings. Here's an example of a Regex that may need a few seconds to run again a given string. For instance, the following regex used to validate email address from an older version of the .NET framework may take a few minutes to parse some specific strings:

C#
// ⚠ Do not use this regex in your application
// This regex **was** part of the .NET Framework
var regex = new Regex(
               @"^((([a-z]|\d|[!#\$%&'\*\+\-\/=\?\^_`{\|}~]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])+(\.([a-z]|\d|[!#\$%&'\*\+\-\/=\?\^_`{\|}~]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])+)*)|((\x22)((((\x20|\x09)*(\x0d\x0a))?(\x20|\x09)+)?(([\x01-\x08\x0b\x0c\x0e-\x1f\x7f]|\x21|[\x23-\x5b]|[\x5d-\x7e]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(\\([\x01-\x09\x0b\x0c\x0d-\x7f]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF]))))*(((\x20|\x09)*(\x0d\x0a))?(\x20|\x09)+)?(\x22)))@((([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.?$",
               RegexOptions.IgnoreCase | RegexOptions.ExplicitCapture);

// Takes more than 30s om my computer
regex.IsMatch("t@t.t.t.t.t.t.t.t.t.t.t.t.t.t.t.t.t.t.t.t.t.t.t.t.t.c%20");

The problem of this regex is its complexity of execution. You should avoid regexes that use backtracking. You can reproduce the issue with a simpler regex: (a+)+b. If you try to match aaaaaaaab, you may notice that there are lots of ways to match the pattern (a+)+ (256, and it doubles for each additional "a"). The regex engine may need to try all of them which takes lots of time. The time is exponential relative to the input size. This website explains backtracking in details. You could also check this Cloudflare post-mortem.

If you use the above regex to match an email address in your web application, a malicious user can use all your CPU by sending a few requests with the crafted email address. Thus, it's going to slow down your website or make it inaccessible. You can prevent this attack by specifying a timeout for the regex execution.

C#
var regex = new Regex(
    @"...",
    RegexOptions.IgnoreCase | RegexOptions.ExplicitCapture,
    TimeSpan.FromSeconds(1)); // 👈 Set the maximum execution time for the Regex

try
{
    regex.IsMatch("...");
}
catch (RegexMatchTimeoutException)
{
    // handle the error
}

Or you can apply the timeout globally using a variable in the app domain:

C#
AppDomain domain = AppDomain.CurrentDomain;
domain.SetData("REGEX_DEFAULT_MATCH_TIMEOUT", TimeSpan.FromSeconds(1));

Adding a timeout will prevent the regex to run for too long. However, you should also think about simplifying the pattern. When possible, you should avoid:

  • Grouping with repetition
  • Inside the repeated group:
    • Repetition
    • Alternation with overlapping (e.g. (a|aa)+)

Last but not least, you should think about replacing the regex with a classic text parser. For instance, Microsoft has replaced the previous regex with a very simple check that is sufficient for most cases:

C#
// https://github.com/dotnet/runtime/blob/master/src/libraries/System.ComponentModel.Annotations/src/System/ComponentModel/DataAnnotations/EmailAddressAttribute.cs
bool found = false;
for (int i = 0; i < valueAsString.Length; i++)
{
    if (valueAsString[i] == '@')
    {
        if (found || i == 0 || i == valueAsString.Length - 1)
            return false;

        found = true;
    }
}

return found;

This way the check is very fast and not dependent on any crafted values. Only the length of the string may change the time of the validation.

#Getting warnings in the IDE using a Roslyn Analyzer

You can check the usages of these methods in your applications using a Roslyn analyzer. The good news is the free analyzer I've made already contains rules for that: https://github.com/meziantou/Meziantou.Analyzer.

You can install the Visual Studio extension or the NuGet package to analyze your code:

#Conclusion

Regexes are very powerful. But with great power comes great responsibility. Be very careful when you write a regex to not introduce a Regular expression Denial of Service (ReDoS). This can be done by avoiding patterns that rely on backtracking as explain in this post.

Do you have a question or a suggestion about this post? Contact me!

Follow me:
Enjoy this blog?Buy Me A Coffee💖 Sponsor on GitHub