Enumerating files using Globbing and System.IO.Enumeration

 
 
  • Gérald Barré

Glob patterns (Wikipedia) are a very common way to specify a list of files to include or exclude. For instance, **/*.csproj match any file with the .csproj extension. You can use glob patterns in many cases, such as in the .gitignore file, in bash, or PowerShell.

.NET Core 2.1 introduced a new API for customizable and high-performance file enumerations with the new namespace System.IO.Enumeration. You can read more about these new APIs in the design document on GitHub: Extensible File Enumeration. The main type is FileSystemEnumerator<T> which allows enumerating all items of a folder. It provides 2 methods to customize the enumeration:

  • ShouldIncludeEntry(ref FileSystemEntry entry) determines whether the specified file system entry should be included in the results
  • ShouldRecurseIntoEntry(ref FileSystemEntry entry) determines whether the specified file system entry should be recursed

The FileSystemEntry struct exposes many entry properties such as the file/folder name, the file length, the file attributes, the containing directory, etc. To filter files using a glob pattern, the globbing library must provide methods that can handle the directory and file name as separate values to check if the entry matches the glob pattern. Indeed, getting the full path would allocate a string which is something that should be avoided for performance reasons. Also, it must be able to check if it is needed to recurse into a folder.

The library Meziantou.Framework.Globbing provides these methods, so it works well with the FileSystemEnumerator<T> API!

Please consider upvoting the following GitHub issue if you want globbing to be built into .NET: Feature Request: File System Globbing

#Glob features supported by Meziantou.Framework.Globbing

Supports these glob features:

  • * matches any number of characters including none
  • ? matches a single character
  • [abc] matches one character given in the bracket
  • [!abc] matches any character not in the brackets
  • [a-z] matches one character from the range given in the bracket
  • [!a-z] matches one character not in the range given in the bracket
  • {abc,123} matches one of the literals
  • ** matches zero or more directories

#How to use Meziantou.Framework.Globbing

First, you need to reference the NuGet package:

csproj (MSBuild project file)
<Project>
    <ItemGroup>
        <PackageReference Include="Meziantou.Framework.Globbing" Version="1.0.4" />
    </ItemGroup>
</Project>
  • Parse a Glob pattern

    C#
    Glob glob = Glob.Parse("src/**/*.txt", GlobOptions.None);
    Glob glob = Glob.Parse("src/**/*.txt", GlobOptions.IgnoreCase);
    
    var isValid = Glob.TryParse("src/**/*.txt", GlobOptions.None, out Glob glob);
  • IsMatch tests whether a file matches the glob pattern

    C#
    Glob glob = Glob.Parse("src/**/*.txt", GlobOptions.IgnoreCase);
    glob.IsMatch("src/abc.txt"); // true
    glob.IsMatch("Src/test/abc.txt"); // true
    glob.IsMatch("src/test/abc.png"); // false
    glob.IsMatch("test/test/ab.txt"); // false
    
    // Support spans
    ReadOnlySpan<char> path = "src/test/ab.txt";
    glob.IsMatch(path);
  • IsPartialMatch tests whether the path matches the beginning of the glob pattern. This allows knowing if you should recurse into a directory.

    C#
    Glob glob = Glob.Parse("src/**/*.txt", GlobOptions.None);
    glob.IsPartialMatch("src/test"); // true
    glob.IsPartialMatch("tests/"); // false
    
    // Support spans
    ReadOnlySpan<char> path = "src/test";
    glob.IsPartialMatch(path); // true
  • Enumerate files that match a glob pattern:

    C#
    // Enumerate files that match the glob in the folder dir
    Glob glob = Glob.Parse("src/**/*.txt", GlobOptions.None);
    foreach(var file in glob.EnumerateFiles("rootDirectory"))
    {
        Console.WriteLine(file);
    }
    C#
    // Using System.IO.EnumerationOptions
    Glob glob = Glob.Parse("src/**/*.txt", GlobOptions.None);
    var enumerationOptions = new EnumerationOptions
    {
        IgnoreInaccessible = true,
        AttributesToSkip = FileAttributes.Hidden,
    };
    foreach(var file in glob.EnumerateFiles("dir", enumerationOptions))
    {
        Console.WriteLine(file);
    }
  • Enumerate files that match a glob pattern collection (like a .gitignore file)

    C#
    GlobCollection globs = new GlobCollection(
        Glob.Parse("src/**/*.{txt,md}", GlobOptions.None),
        Glob.Parse("!src/dummy/readme.{txt,md}", GlobOptions.None)); // exclude 'src/dummy/readme.txt' and 'src/dummy/readme.md'
    
    foreach(var file in globs.EnumerateFiles("rootDirectory"))
    {
        Console.WriteLine(file);
    }

##Implementing a custom FileSystemEnumerator<T>

The library provides an implementation of System.IO.Enumeration.FileSystemEnumerator<T> that filters files using a Glob or GlobCollection instance. You can inherit from this class if you need to customize the way it enumerates files. For instance, you can filter files based on their attributes (hidden, read-only, etc.), their size, or their last access date.

C#
 // GlobFileSystemEnumerator inherits from System.IO.Enumerations.FileSystemEnumerator<T>
public abstract class MyCustomFileSystemEnumerator : GlobFileSystemEnumerator<string>
{
    protected MyCustomFileSystemEnumerator(Glob glob, string directory, EnumerationOptions? options = null)
        : base(glob, directory, options)
    {
    }

    // FileSystemEntry documentation: https://learn.microsoft.com/en-us/dotnet/api/system.io.enumeration.filesystementry?WT.mc_id=DT-MVP-5003978
    protected override bool ShouldRecurseIntoEntry(ref FileSystemEntry entry)
    {
        // TODO custom logic

        // base.ShouldIncludeEntry uses glob.IsPartialMatch
        return base.ShouldRecurseIntoEntry(ref entry);
    }

    protected override bool ShouldIncludeEntry(ref FileSystemEntry entry)
    {
        // TODO custom filter logic
        // For insance, exclude file that are bigger than 10000 bytes
        if(!entry.Directory && entry.Length > 10_000)
            return false;

        // base.ShouldIncludeEntry uses glob.IsMatch
        return base.ShouldIncludeEntry(ref entry);
    }

    protected override string TransformEntry(ref FileSystemEntry entry)
    {
        return entry.ToFullPath();
    }
}

Then, you can use the custom enumerator:

C#
Glob glob = Glob.Parse("src/**/*.txt", GlobOptions.None);
using var enumerator = new MyCustomFileSystemEnumerator(glob, @"c:\sample", new EnumerationOptions());
while (enumerator.MoveNext())
{
    Console.WriteLine(enumerator.Current);
}

#Benchmarks

The code of the benchmarks is available on GitHub: https://github.com/meziantou/Meziantou.Framework/tree/master/benchmarks/GlobbingBenchmarks

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19041.572 (2004/?/20H1)
Intel Core i5-6600 CPU 3.30GHz (Skylake), 1 CPU, 4 logical and 4 physical cores
.NET Core SDK=5.0.100-rc.2.20479.15
  [Host]     : .NET Core 5.0.0 (CoreCLR 5.0.20.47505, CoreFX 5.0.20.47505), X64 RyuJIT
  DefaultJob : .NET Core 5.0.0 (CoreCLR 5.0.20.47505, CoreFX 5.0.20.47505), X64 RyuJIT

##Benchmark Glob.IsMatch

For most scenarios, Meziantou.Framework.Globbing and DotNet.Glob are very similar in terms of performance. They have different optimizations, so performance can vary based on the glob pattern. That's being said, both are fast enough for most use-cases. Here're a few performance tests:

MethodPatternPathMeanAllocated
Meziantou_Globbing*.txtfile0001.txt42 ns-
DotNet_Globbing_Glob*.txtfile0001.txt39 ns-
Kthompson_Glob_Compiled*.txtfile0001.txt239 ns64 B
Meziantou_Globbing*/.txtfolde(…)1.txt [41]35 ns-
DotNet_Globbing_Glob**/*.txtfolde(…)1.txt [41]191 ns-
Kthompson_Glob_Compiled**/*.txtfolde(…)1.txt [41]863 ns304 B
Meziantou_Globbing*/file.txttest0(…)1.txt [40]67 ns-
DotNet_Globbing_Glob*/file.txttest0(…)1.txt [40]190 ns-
Kthompson_Glob_Compiled*/file.txttest0(…)1.txt [40]429 ns304 B
Meziantou_Globbingsrc/**/*.csprojsrc/s(…)sproj [76]51 ns-
DotNet_Globbing_Globsrc/**/*.csprojsrc/s(…)sproj [76]226 ns-
Kthompson_Glob_Compiledsrc/**/*.csprojsrc/s(…)sproj [76]1,612 ns336 B
Meziantou_Globbingfolder[0-1]/**/f{ab,il}[aei]*.{txt,png,ico}folde(…)1.txt [41]160 ns-
DotNet_Globbing_Globfolder[0-1]/**/f{ab,il}[aei]*.{txt,png,ico}folde(…)1.txt [41]84 ns-
Kthompson_Glob_Compiledfolder[0-1]/**/f{ab,il}[aei]*.{txt,png,ico}folde(…)1.txt [41]529 ns352 B

##Benchmark Glob.EnumerateFiles

  • flat is a single folder that contains 100,000 files with different extensions.
  • hierarchy contains about 7000 Files and 1300 Folders with a max depth of 4.

DotNet.Glob doesn't have a method to enumerate files, so it is not included here.

MethodFolderPatternMeanAllocated
Meziantou_Globbingflat*/file.txt89,168 μs17775 KB
Kthompson_Glob_Compiledflat*/file.txt246,550 μs44812 KB
Meziantou_Globbingflat*.txt86,141 μs17774 KB
Kthompson_Glob_Compiledflat*.txt260,551 μs44810 KB
Meziantou_Globbingflatfile.txt*85,974 μs17774 KB
Kthompson_Glob_Compiledflatfile*.txt242,755 μs44813 KB
Meziantou_Globbingflatfolde(…),ico} [43]72,350 μs3 KB
Kthompson_Glob_Compiledflatfolde(…),ico} [43]69,561 μs5 KB
Meziantou_Globbinghierarchy*/file.txt82,576 μs733 KB
Kthompson_Glob_Compiledhierarchy*/file.txt283,598 μs5077 KB
Meziantou_Globbinghierarchy*.txt76 μs1 KB
Kthompson_Glob_Compiledhierarchy*.txt237 μs13 KB
Meziantou_Globbinghierarchyfile.txt*78 μs2 KB
Kthompson_Glob_Compiledhierarchyfile*.txt234 μs13 KB
Meziantou_Globbinghierarchyfolde(…),ico} [43]39,924 μs368 KB
Kthompson_Glob_Compiledhierarchyfolde(…),ico} [43]137,261 μs2611 KB

#Additional resources

Do you have a question or a suggestion about this post? Contact me!

Follow me:
Enjoy this blog?Buy Me A Coffee💖 Sponsor on GitHub