Glob patterns (Wikipedia) are a common way to specify files to include or exclude. For instance, **/*.csproj matches any file with the .csproj extension. They are widely used in .gitignore files, bash, and PowerShell.
.NET Core 2.1 introduced a high-performance, customizable file enumeration API via the System.IO.Enumeration namespace. You can read more in the design document on GitHub: Extensible File Enumeration. The main type is FileSystemEnumerator<T>, which enumerates all items in a folder. It exposes two methods to customize enumeration:
ShouldIncludeEntry(ref FileSystemEntry entry) determines whether the specified file system entry should be included in the resultsShouldRecurseIntoEntry(ref FileSystemEntry entry) determines whether the specified file system entry should be recursed
The FileSystemEntry struct exposes properties such as the file/folder name, length, attributes, and containing directory. To filter files with a glob pattern, the globbing library must accept the directory and file name as separate values rather than a combined path. Building the full path would allocate a string, which hurts performance. The library must also determine whether to recurse into a given directory.
The Meziantou.Framework.Globbing library provides these methods, making it a great fit for the FileSystemEnumerator<T> API.
Please consider upvoting the following GitHub issue if you want globbing to be built into .NET: Feature Request: File System Globbing
#Glob features supported by Meziantou.Framework.Globbing
The library supports the following glob features:
* matches any number of characters including none? matches a single character[abc] matches one character given in the bracket[!abc] matches any character not in the brackets[a-z] matches one character from the range given in the bracket[!a-z] matches one character not in the range given in the bracket{abc,123} matches one of the literals** matches zero or more directories
#How to use Meziantou.Framework.Globbing
First, you need to reference the NuGet package:
csproj (MSBuild project file)
<Project>
<ItemGroup>
<PackageReference Include="Meziantou.Framework.Globbing" Version="1.0.4" />
</ItemGroup>
</Project>
Parse a Glob pattern
C#
Glob glob = Glob.Parse("src/**/*.txt", GlobOptions.None);
Glob glob = Glob.Parse("src/**/*.txt", GlobOptions.IgnoreCase);
var isValid = Glob.TryParse("src/**/*.txt", GlobOptions.None, out Glob glob);
IsMatch tests whether a file matches the glob pattern
C#
Glob glob = Glob.Parse("src/**/*.txt", GlobOptions.IgnoreCase);
glob.IsMatch("src/abc.txt"); // true
glob.IsMatch("Src/test/abc.txt"); // true
glob.IsMatch("src/test/abc.png"); // false
glob.IsMatch("test/test/ab.txt"); // false
// Support spans
ReadOnlySpan<char> path = "src/test/ab.txt";
glob.IsMatch(path);
IsPartialMatch tests whether the path matches the beginning of the glob pattern. This allows knowing if you should recurse into a directory.
C#
Glob glob = Glob.Parse("src/**/*.txt", GlobOptions.None);
glob.IsPartialMatch("src/test"); // true
glob.IsPartialMatch("tests/"); // false
// Support spans
ReadOnlySpan<char> path = "src/test";
glob.IsPartialMatch(path); // true
Enumerate files that match a glob pattern:
C#
// Enumerate files that match the glob in the folder dir
Glob glob = Glob.Parse("src/**/*.txt", GlobOptions.None);
foreach(var file in glob.EnumerateFiles("rootDirectory"))
{
Console.WriteLine(file);
}
C#
// Using System.IO.EnumerationOptions
Glob glob = Glob.Parse("src/**/*.txt", GlobOptions.None);
var enumerationOptions = new EnumerationOptions
{
IgnoreInaccessible = true,
AttributesToSkip = FileAttributes.Hidden,
};
foreach(var file in glob.EnumerateFiles("dir", enumerationOptions))
{
Console.WriteLine(file);
}
Enumerate files that match a glob pattern collection (like a .gitignore file)
C#
GlobCollection globs = new GlobCollection(
Glob.Parse("src/**/*.{txt,md}", GlobOptions.None),
Glob.Parse("!src/dummy/readme.{txt,md}", GlobOptions.None)); // exclude 'src/dummy/readme.txt' and 'src/dummy/readme.md'
foreach(var file in globs.EnumerateFiles("rootDirectory"))
{
Console.WriteLine(file);
}
##Implementing a custom FileSystemEnumerator<T>
The library includes a GlobFileSystemEnumerator<T> that inherits from System.IO.Enumeration.FileSystemEnumerator<T> and filters files using a Glob or GlobCollection instance. You can subclass it to customize enumeration further, for example to filter by file attributes (hidden, read-only), size, or last access date.
C#
// GlobFileSystemEnumerator inherits from System.IO.Enumerations.FileSystemEnumerator<T>
public abstract class MyCustomFileSystemEnumerator : GlobFileSystemEnumerator<string>
{
protected MyCustomFileSystemEnumerator(Glob glob, string directory, EnumerationOptions? options = null)
: base(glob, directory, options)
{
}
// FileSystemEntry documentation: https://learn.microsoft.com/en-us/dotnet/api/system.io.enumeration.filesystementry?WT.mc_id=DT-MVP-5003978
protected override bool ShouldRecurseIntoEntry(ref FileSystemEntry entry)
{
// TODO custom logic
// base.ShouldIncludeEntry uses glob.IsPartialMatch
return base.ShouldRecurseIntoEntry(ref entry);
}
protected override bool ShouldIncludeEntry(ref FileSystemEntry entry)
{
// TODO custom filter logic
// For insance, exclude file that are bigger than 10000 bytes
if(!entry.Directory && entry.Length > 10_000)
return false;
// base.ShouldIncludeEntry uses glob.IsMatch
return base.ShouldIncludeEntry(ref entry);
}
protected override string TransformEntry(ref FileSystemEntry entry)
{
return entry.ToFullPath();
}
}
Then, you can use the custom enumerator:
C#
Glob glob = Glob.Parse("src/**/*.txt", GlobOptions.None);
using var enumerator = new MyCustomFileSystemEnumerator(glob, @"c:\sample", new EnumerationOptions());
while (enumerator.MoveNext())
{
Console.WriteLine(enumerator.Current);
}
#Benchmarks
The code of the benchmarks is available on GitHub: https://github.com/meziantou/Meziantou.Framework/tree/master/benchmarks/GlobbingBenchmarks
BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19041.572 (2004/?/20H1)
Intel Core i5-6600 CPU 3.30GHz (Skylake), 1 CPU, 4 logical and 4 physical cores
.NET Core SDK=5.0.100-rc.2.20479.15
[Host] : .NET Core 5.0.0 (CoreCLR 5.0.20.47505, CoreFX 5.0.20.47505), X64 RyuJIT
DefaultJob : .NET Core 5.0.0 (CoreCLR 5.0.20.47505, CoreFX 5.0.20.47505), X64 RyuJIT
##Benchmark Glob.IsMatch
For most scenarios, Meziantou.Framework.Globbing and DotNet.Glob perform similarly. They use different optimizations, so results can vary depending on the glob pattern. That said, both are fast enough for most use cases. Here are a few benchmark results:
| Method | Pattern | Path | Mean | Allocated |
|---|
| Meziantou_Globbing | *.txt | file0001.txt | 42 ns | - |
| DotNet_Globbing_Glob | *.txt | file0001.txt | 39 ns | - |
| Kthompson_Glob_Compiled | *.txt | file0001.txt | 239 ns | 64 B |
| Meziantou_Globbing | */.txt | folde(…)1.txt [41] | 35 ns | - |
| DotNet_Globbing_Glob | **/*.txt | folde(…)1.txt [41] | 191 ns | - |
| Kthompson_Glob_Compiled | **/*.txt | folde(…)1.txt [41] | 863 ns | 304 B |
| Meziantou_Globbing | */file.txt | test0(…)1.txt [40] | 67 ns | - |
| DotNet_Globbing_Glob | */file.txt | test0(…)1.txt [40] | 190 ns | - |
| Kthompson_Glob_Compiled | */file.txt | test0(…)1.txt [40] | 429 ns | 304 B |
| Meziantou_Globbing | src/**/*.csproj | src/s(…)sproj [76] | 51 ns | - |
| DotNet_Globbing_Glob | src/**/*.csproj | src/s(…)sproj [76] | 226 ns | - |
| Kthompson_Glob_Compiled | src/**/*.csproj | src/s(…)sproj [76] | 1,612 ns | 336 B |
| Meziantou_Globbing | folder[0-1]/**/f{ab,il}[aei]*.{txt,png,ico} | folde(…)1.txt [41] | 160 ns | - |
| DotNet_Globbing_Glob | folder[0-1]/**/f{ab,il}[aei]*.{txt,png,ico} | folde(…)1.txt [41] | 84 ns | - |
| Kthompson_Glob_Compiled | folder[0-1]/**/f{ab,il}[aei]*.{txt,png,ico} | folde(…)1.txt [41] | 529 ns | 352 B |
##Benchmark Glob.EnumerateFiles
flat is a single folder that contains 100,000 files with different extensions.hierarchy contains about 7000 Files and 1300 Folders with a max depth of 4.
DotNet.Glob doesn't have a method to enumerate files, so it is not included here.
| Method | Folder | Pattern | Mean | Allocated |
|---|
| Meziantou_Globbing | flat | */file.txt | 89,168 μs | 17775 KB |
| Kthompson_Glob_Compiled | flat | */file.txt | 246,550 μs | 44812 KB |
| Meziantou_Globbing | flat | *.txt | 86,141 μs | 17774 KB |
| Kthompson_Glob_Compiled | flat | *.txt | 260,551 μs | 44810 KB |
| Meziantou_Globbing | flat | file.txt* | 85,974 μs | 17774 KB |
| Kthompson_Glob_Compiled | flat | file*.txt | 242,755 μs | 44813 KB |
| Meziantou_Globbing | flat | folde(…),ico} [43] | 72,350 μs | 3 KB |
| Kthompson_Glob_Compiled | flat | folde(…),ico} [43] | 69,561 μs | 5 KB |
| Meziantou_Globbing | hierarchy | */file.txt | 82,576 μs | 733 KB |
| Kthompson_Glob_Compiled | hierarchy | */file.txt | 283,598 μs | 5077 KB |
| Meziantou_Globbing | hierarchy | *.txt | 76 μs | 1 KB |
| Kthompson_Glob_Compiled | hierarchy | *.txt | 237 μs | 13 KB |
| Meziantou_Globbing | hierarchy | file.txt* | 78 μs | 2 KB |
| Kthompson_Glob_Compiled | hierarchy | file*.txt | 234 μs | 13 KB |
| Meziantou_Globbing | hierarchy | folde(…),ico} [43] | 39,924 μs | 368 KB |
| Kthompson_Glob_Compiled | hierarchy | folde(…),ico} [43] | 137,261 μs | 2611 KB |
#Additional resources
Do you have a question or a suggestion about this post? Contact me!