Enumerating files using Globbing and System.IO.Enumeration

 
 
  • Gérald Barré

Glob patterns (Wikipedia) are a common way to specify files to include or exclude. For instance, **/*.csproj matches any file with the .csproj extension. They are widely used in .gitignore files, bash, and PowerShell.

.NET Core 2.1 introduced a high-performance, customizable file enumeration API via the System.IO.Enumeration namespace. You can read more in the design document on GitHub: Extensible File Enumeration. The main type is FileSystemEnumerator<T>, which enumerates all items in a folder. It exposes two methods to customize enumeration:

  • ShouldIncludeEntry(ref FileSystemEntry entry) determines whether the specified file system entry should be included in the results
  • ShouldRecurseIntoEntry(ref FileSystemEntry entry) determines whether the specified file system entry should be recursed

The FileSystemEntry struct exposes properties such as the file/folder name, length, attributes, and containing directory. To filter files with a glob pattern, the globbing library must accept the directory and file name as separate values rather than a combined path. Building the full path would allocate a string, which hurts performance. The library must also determine whether to recurse into a given directory.

The Meziantou.Framework.Globbing library provides these methods, making it a great fit for the FileSystemEnumerator<T> API.

Please consider upvoting the following GitHub issue if you want globbing to be built into .NET: Feature Request: File System Globbing

#Glob features supported by Meziantou.Framework.Globbing

The library supports the following glob features:

  • * matches any number of characters including none
  • ? matches a single character
  • [abc] matches one character given in the bracket
  • [!abc] matches any character not in the brackets
  • [a-z] matches one character from the range given in the bracket
  • [!a-z] matches one character not in the range given in the bracket
  • {abc,123} matches one of the literals
  • ** matches zero or more directories

#How to use Meziantou.Framework.Globbing

First, you need to reference the NuGet package:

csproj (MSBuild project file)
<Project>
    <ItemGroup>
        <PackageReference Include="Meziantou.Framework.Globbing" Version="1.0.4" />
    </ItemGroup>
</Project>
  • Parse a Glob pattern

    C#
    Glob glob = Glob.Parse("src/**/*.txt", GlobOptions.None);
    Glob glob = Glob.Parse("src/**/*.txt", GlobOptions.IgnoreCase);
    
    var isValid = Glob.TryParse("src/**/*.txt", GlobOptions.None, out Glob glob);
  • IsMatch tests whether a file matches the glob pattern

    C#
    Glob glob = Glob.Parse("src/**/*.txt", GlobOptions.IgnoreCase);
    glob.IsMatch("src/abc.txt"); // true
    glob.IsMatch("Src/test/abc.txt"); // true
    glob.IsMatch("src/test/abc.png"); // false
    glob.IsMatch("test/test/ab.txt"); // false
    
    // Support spans
    ReadOnlySpan<char> path = "src/test/ab.txt";
    glob.IsMatch(path);
  • IsPartialMatch tests whether the path matches the beginning of the glob pattern. This allows knowing if you should recurse into a directory.

    C#
    Glob glob = Glob.Parse("src/**/*.txt", GlobOptions.None);
    glob.IsPartialMatch("src/test"); // true
    glob.IsPartialMatch("tests/"); // false
    
    // Support spans
    ReadOnlySpan<char> path = "src/test";
    glob.IsPartialMatch(path); // true
  • Enumerate files that match a glob pattern:

    C#
    // Enumerate files that match the glob in the folder dir
    Glob glob = Glob.Parse("src/**/*.txt", GlobOptions.None);
    foreach(var file in glob.EnumerateFiles("rootDirectory"))
    {
        Console.WriteLine(file);
    }
    C#
    // Using System.IO.EnumerationOptions
    Glob glob = Glob.Parse("src/**/*.txt", GlobOptions.None);
    var enumerationOptions = new EnumerationOptions
    {
        IgnoreInaccessible = true,
        AttributesToSkip = FileAttributes.Hidden,
    };
    foreach(var file in glob.EnumerateFiles("dir", enumerationOptions))
    {
        Console.WriteLine(file);
    }
  • Enumerate files that match a glob pattern collection (like a .gitignore file)

    C#
    GlobCollection globs = new GlobCollection(
        Glob.Parse("src/**/*.{txt,md}", GlobOptions.None),
        Glob.Parse("!src/dummy/readme.{txt,md}", GlobOptions.None)); // exclude 'src/dummy/readme.txt' and 'src/dummy/readme.md'
    
    foreach(var file in globs.EnumerateFiles("rootDirectory"))
    {
        Console.WriteLine(file);
    }

##Implementing a custom FileSystemEnumerator<T>

The library includes a GlobFileSystemEnumerator<T> that inherits from System.IO.Enumeration.FileSystemEnumerator<T> and filters files using a Glob or GlobCollection instance. You can subclass it to customize enumeration further, for example to filter by file attributes (hidden, read-only), size, or last access date.

C#
 // GlobFileSystemEnumerator inherits from System.IO.Enumerations.FileSystemEnumerator<T>
public abstract class MyCustomFileSystemEnumerator : GlobFileSystemEnumerator<string>
{
    protected MyCustomFileSystemEnumerator(Glob glob, string directory, EnumerationOptions? options = null)
        : base(glob, directory, options)
    {
    }

    // FileSystemEntry documentation: https://learn.microsoft.com/en-us/dotnet/api/system.io.enumeration.filesystementry?WT.mc_id=DT-MVP-5003978
    protected override bool ShouldRecurseIntoEntry(ref FileSystemEntry entry)
    {
        // TODO custom logic

        // base.ShouldIncludeEntry uses glob.IsPartialMatch
        return base.ShouldRecurseIntoEntry(ref entry);
    }

    protected override bool ShouldIncludeEntry(ref FileSystemEntry entry)
    {
        // TODO custom filter logic
        // For insance, exclude file that are bigger than 10000 bytes
        if(!entry.Directory && entry.Length > 10_000)
            return false;

        // base.ShouldIncludeEntry uses glob.IsMatch
        return base.ShouldIncludeEntry(ref entry);
    }

    protected override string TransformEntry(ref FileSystemEntry entry)
    {
        return entry.ToFullPath();
    }
}

Then, you can use the custom enumerator:

C#
Glob glob = Glob.Parse("src/**/*.txt", GlobOptions.None);
using var enumerator = new MyCustomFileSystemEnumerator(glob, @"c:\sample", new EnumerationOptions());
while (enumerator.MoveNext())
{
    Console.WriteLine(enumerator.Current);
}

#Benchmarks

The code of the benchmarks is available on GitHub: https://github.com/meziantou/Meziantou.Framework/tree/master/benchmarks/GlobbingBenchmarks

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19041.572 (2004/?/20H1)
Intel Core i5-6600 CPU 3.30GHz (Skylake), 1 CPU, 4 logical and 4 physical cores
.NET Core SDK=5.0.100-rc.2.20479.15
  [Host]     : .NET Core 5.0.0 (CoreCLR 5.0.20.47505, CoreFX 5.0.20.47505), X64 RyuJIT
  DefaultJob : .NET Core 5.0.0 (CoreCLR 5.0.20.47505, CoreFX 5.0.20.47505), X64 RyuJIT

##Benchmark Glob.IsMatch

For most scenarios, Meziantou.Framework.Globbing and DotNet.Glob perform similarly. They use different optimizations, so results can vary depending on the glob pattern. That said, both are fast enough for most use cases. Here are a few benchmark results:

MethodPatternPathMeanAllocated
Meziantou_Globbing*.txtfile0001.txt42 ns-
DotNet_Globbing_Glob*.txtfile0001.txt39 ns-
Kthompson_Glob_Compiled*.txtfile0001.txt239 ns64 B
Meziantou_Globbing*/.txtfolde(…)1.txt [41]35 ns-
DotNet_Globbing_Glob**/*.txtfolde(…)1.txt [41]191 ns-
Kthompson_Glob_Compiled**/*.txtfolde(…)1.txt [41]863 ns304 B
Meziantou_Globbing*/file.txttest0(…)1.txt [40]67 ns-
DotNet_Globbing_Glob*/file.txttest0(…)1.txt [40]190 ns-
Kthompson_Glob_Compiled*/file.txttest0(…)1.txt [40]429 ns304 B
Meziantou_Globbingsrc/**/*.csprojsrc/s(…)sproj [76]51 ns-
DotNet_Globbing_Globsrc/**/*.csprojsrc/s(…)sproj [76]226 ns-
Kthompson_Glob_Compiledsrc/**/*.csprojsrc/s(…)sproj [76]1,612 ns336 B
Meziantou_Globbingfolder[0-1]/**/f{ab,il}[aei]*.{txt,png,ico}folde(…)1.txt [41]160 ns-
DotNet_Globbing_Globfolder[0-1]/**/f{ab,il}[aei]*.{txt,png,ico}folde(…)1.txt [41]84 ns-
Kthompson_Glob_Compiledfolder[0-1]/**/f{ab,il}[aei]*.{txt,png,ico}folde(…)1.txt [41]529 ns352 B

##Benchmark Glob.EnumerateFiles

  • flat is a single folder that contains 100,000 files with different extensions.
  • hierarchy contains about 7000 Files and 1300 Folders with a max depth of 4.

DotNet.Glob doesn't have a method to enumerate files, so it is not included here.

MethodFolderPatternMeanAllocated
Meziantou_Globbingflat*/file.txt89,168 μs17775 KB
Kthompson_Glob_Compiledflat*/file.txt246,550 μs44812 KB
Meziantou_Globbingflat*.txt86,141 μs17774 KB
Kthompson_Glob_Compiledflat*.txt260,551 μs44810 KB
Meziantou_Globbingflatfile.txt*85,974 μs17774 KB
Kthompson_Glob_Compiledflatfile*.txt242,755 μs44813 KB
Meziantou_Globbingflatfolde(…),ico} [43]72,350 μs3 KB
Kthompson_Glob_Compiledflatfolde(…),ico} [43]69,561 μs5 KB
Meziantou_Globbinghierarchy*/file.txt82,576 μs733 KB
Kthompson_Glob_Compiledhierarchy*/file.txt283,598 μs5077 KB
Meziantou_Globbinghierarchy*.txt76 μs1 KB
Kthompson_Glob_Compiledhierarchy*.txt237 μs13 KB
Meziantou_Globbinghierarchyfile.txt*78 μs2 KB
Kthompson_Glob_Compiledhierarchyfile*.txt234 μs13 KB
Meziantou_Globbinghierarchyfolde(…),ico} [43]39,924 μs368 KB
Kthompson_Glob_Compiledhierarchyfolde(…),ico} [43]137,261 μs2611 KB

#Additional resources

Do you have a question or a suggestion about this post? Contact me!

Follow me:
Enjoy this blog?