Enumerating files using Globbing and System.IO.Enumeration
Glob patterns (Wikipedia) are a very common way to specify a list of files to include or exclude. For instance, **/*.csproj
match any file with the .csproj
extension. You can use glob patterns in many cases, such as in the .gitignore
file, in bash, or PowerShell.
.NET Core 2.1 introduced a new API for customizable and high-performance file enumerations with the new namespace System.IO.Enumeration
. You can read more about these new APIs in the design document on GitHub: Extensible File Enumeration. The main type is FileSystemEnumerator<T>
which allows enumerating all items of a folder. It provides 2 methods to customize the enumeration:
ShouldIncludeEntry(ref FileSystemEntry entry)
determines whether the specified file system entry should be included in the resultsShouldRecurseIntoEntry(ref FileSystemEntry entry)
determines whether the specified file system entry should be recursed
The FileSystemEntry struct exposes many entry properties such as the file/folder name, the file length, the file attributes, the containing directory, etc. To filter files using a glob pattern, the globbing library must provide methods that can handle the directory and file name as separate values to check if the entry matches the glob pattern. Indeed, getting the full path would allocate a string which is something that should be avoided for performance reasons. Also, it must be able to check if it is needed to recurse into a folder.
The library Meziantou.Framework.Globbing
provides these methods, so it works well with the FileSystemEnumerator<T>
API!
Please consider upvoting the following GitHub issue if you want globbing to be built into .NET: Feature Request: File System Globbing
#Glob features supported by Meziantou.Framework.Globbing
Supports these glob features:
*
matches any number of characters including none?
matches a single character[abc]
matches one character given in the bracket[!abc]
matches any character not in the brackets[a-z]
matches one character from the range given in the bracket[!a-z]
matches one character not in the range given in the bracket{abc,123}
matches one of the literals**
matches zero or more directories
#How to use Meziantou.Framework.Globbing
First, you need to reference the NuGet package:
<Project>
<ItemGroup>
<PackageReference Include="Meziantou.Framework.Globbing" Version="1.0.4" />
</ItemGroup>
</Project>
Parse a Glob pattern
C#Glob glob = Glob.Parse("src/**/*.txt", GlobOptions.None); Glob glob = Glob.Parse("src/**/*.txt", GlobOptions.IgnoreCase); var isValid = Glob.TryParse("src/**/*.txt", GlobOptions.None, out Glob glob);
IsMatch
tests whether a file matches the glob patternC#Glob glob = Glob.Parse("src/**/*.txt", GlobOptions.IgnoreCase); glob.IsMatch("src/abc.txt"); // true glob.IsMatch("Src/test/abc.txt"); // true glob.IsMatch("src/test/abc.png"); // false glob.IsMatch("test/test/ab.txt"); // false // Support spans ReadOnlySpan<char> path = "src/test/ab.txt"; glob.IsMatch(path);
IsPartialMatch
tests whether the path matches the beginning of the glob pattern. This allows knowing if you should recurse into a directory.C#Glob glob = Glob.Parse("src/**/*.txt", GlobOptions.None); glob.IsPartialMatch("src/test"); // true glob.IsPartialMatch("tests/"); // false // Support spans ReadOnlySpan<char> path = "src/test"; glob.IsPartialMatch(path); // true
Enumerate files that match a glob pattern:
C#// Enumerate files that match the glob in the folder dir Glob glob = Glob.Parse("src/**/*.txt", GlobOptions.None); foreach(var file in glob.EnumerateFiles("rootDirectory")) { Console.WriteLine(file); }
C#// Using System.IO.EnumerationOptions Glob glob = Glob.Parse("src/**/*.txt", GlobOptions.None); var enumerationOptions = new EnumerationOptions { IgnoreInaccessible = true, AttributesToSkip = FileAttributes.Hidden, }; foreach(var file in glob.EnumerateFiles("dir", enumerationOptions)) { Console.WriteLine(file); }
Enumerate files that match a glob pattern collection (like a
.gitignore
file)C#GlobCollection globs = new GlobCollection( Glob.Parse("src/**/*.{txt,md}", GlobOptions.None), Glob.Parse("!src/dummy/readme.{txt,md}", GlobOptions.None)); // exclude 'src/dummy/readme.txt' and 'src/dummy/readme.md' foreach(var file in globs.EnumerateFiles("rootDirectory")) { Console.WriteLine(file); }
##Implementing a custom FileSystemEnumerator<T>
The library provides an implementation of System.IO.Enumeration.FileSystemEnumerator<T>
that filters files using a Glob
or GlobCollection
instance. You can inherit from this class if you need to customize the way it enumerates files. For instance, you can filter files based on their attributes (hidden, read-only, etc.), their size, or their last access date.
// GlobFileSystemEnumerator inherits from System.IO.Enumerations.FileSystemEnumerator<T>
public abstract class MyCustomFileSystemEnumerator : GlobFileSystemEnumerator<string>
{
protected MyCustomFileSystemEnumerator(Glob glob, string directory, EnumerationOptions? options = null)
: base(glob, directory, options)
{
}
// FileSystemEntry documentation: https://learn.microsoft.com/en-us/dotnet/api/system.io.enumeration.filesystementry?WT.mc_id=DT-MVP-5003978
protected override bool ShouldRecurseIntoEntry(ref FileSystemEntry entry)
{
// TODO custom logic
// base.ShouldIncludeEntry uses glob.IsPartialMatch
return base.ShouldRecurseIntoEntry(ref entry);
}
protected override bool ShouldIncludeEntry(ref FileSystemEntry entry)
{
// TODO custom filter logic
// For insance, exclude file that are bigger than 10000 bytes
if(!entry.Directory && entry.Length > 10_000)
return false;
// base.ShouldIncludeEntry uses glob.IsMatch
return base.ShouldIncludeEntry(ref entry);
}
protected override string TransformEntry(ref FileSystemEntry entry)
{
return entry.ToFullPath();
}
}
Then, you can use the custom enumerator:
Glob glob = Glob.Parse("src/**/*.txt", GlobOptions.None);
using var enumerator = new MyCustomFileSystemEnumerator(glob, @"c:\sample", new EnumerationOptions());
while (enumerator.MoveNext())
{
Console.WriteLine(enumerator.Current);
}
#Benchmarks
The code of the benchmarks is available on GitHub: https://github.com/meziantou/Meziantou.Framework/tree/master/benchmarks/GlobbingBenchmarks
BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19041.572 (2004/?/20H1)
Intel Core i5-6600 CPU 3.30GHz (Skylake), 1 CPU, 4 logical and 4 physical cores
.NET Core SDK=5.0.100-rc.2.20479.15
[Host] : .NET Core 5.0.0 (CoreCLR 5.0.20.47505, CoreFX 5.0.20.47505), X64 RyuJIT
DefaultJob : .NET Core 5.0.0 (CoreCLR 5.0.20.47505, CoreFX 5.0.20.47505), X64 RyuJIT
##Benchmark Glob.IsMatch
For most scenarios, Meziantou.Framework.Globbing and DotNet.Glob are very similar in terms of performance. They have different optimizations, so performance can vary based on the glob pattern. That's being said, both are fast enough for most use-cases. Here're a few performance tests:
Method | Pattern | Path | Mean | Allocated |
---|---|---|---|---|
Meziantou_Globbing | *.txt | file0001.txt | 42 ns | - |
DotNet_Globbing_Glob | *.txt | file0001.txt | 39 ns | - |
Kthompson_Glob_Compiled | *.txt | file0001.txt | 239 ns | 64 B |
Meziantou_Globbing | */.txt | folde(…)1.txt [41] | 35 ns | - |
DotNet_Globbing_Glob | **/*.txt | folde(…)1.txt [41] | 191 ns | - |
Kthompson_Glob_Compiled | **/*.txt | folde(…)1.txt [41] | 863 ns | 304 B |
Meziantou_Globbing | */file.txt | test0(…)1.txt [40] | 67 ns | - |
DotNet_Globbing_Glob | */file.txt | test0(…)1.txt [40] | 190 ns | - |
Kthompson_Glob_Compiled | */file.txt | test0(…)1.txt [40] | 429 ns | 304 B |
Meziantou_Globbing | src/**/*.csproj | src/s(…)sproj [76] | 51 ns | - |
DotNet_Globbing_Glob | src/**/*.csproj | src/s(…)sproj [76] | 226 ns | - |
Kthompson_Glob_Compiled | src/**/*.csproj | src/s(…)sproj [76] | 1,612 ns | 336 B |
Meziantou_Globbing | folder[0-1]/**/f{ab,il}[aei]*.{txt,png,ico} | folde(…)1.txt [41] | 160 ns | - |
DotNet_Globbing_Glob | folder[0-1]/**/f{ab,il}[aei]*.{txt,png,ico} | folde(…)1.txt [41] | 84 ns | - |
Kthompson_Glob_Compiled | folder[0-1]/**/f{ab,il}[aei]*.{txt,png,ico} | folde(…)1.txt [41] | 529 ns | 352 B |
##Benchmark Glob.EnumerateFiles
flat
is a single folder that contains 100,000 files with different extensions.hierarchy
contains about 7000 Files and 1300 Folders with a max depth of 4.
DotNet.Glob doesn't have a method to enumerate files, so it is not included here.
Method | Folder | Pattern | Mean | Allocated |
---|---|---|---|---|
Meziantou_Globbing | flat | */file.txt | 89,168 μs | 17775 KB |
Kthompson_Glob_Compiled | flat | */file.txt | 246,550 μs | 44812 KB |
Meziantou_Globbing | flat | *.txt | 86,141 μs | 17774 KB |
Kthompson_Glob_Compiled | flat | *.txt | 260,551 μs | 44810 KB |
Meziantou_Globbing | flat | file.txt* | 85,974 μs | 17774 KB |
Kthompson_Glob_Compiled | flat | file*.txt | 242,755 μs | 44813 KB |
Meziantou_Globbing | flat | folde(…),ico} [43] | 72,350 μs | 3 KB |
Kthompson_Glob_Compiled | flat | folde(…),ico} [43] | 69,561 μs | 5 KB |
Meziantou_Globbing | hierarchy | */file.txt | 82,576 μs | 733 KB |
Kthompson_Glob_Compiled | hierarchy | */file.txt | 283,598 μs | 5077 KB |
Meziantou_Globbing | hierarchy | *.txt | 76 μs | 1 KB |
Kthompson_Glob_Compiled | hierarchy | *.txt | 237 μs | 13 KB |
Meziantou_Globbing | hierarchy | file.txt* | 78 μs | 2 KB |
Kthompson_Glob_Compiled | hierarchy | file*.txt | 234 μs | 13 KB |
Meziantou_Globbing | hierarchy | folde(…),ico} [43] | 39,924 μs | 368 KB |
Kthompson_Glob_Compiled | hierarchy | folde(…),ico} [43] | 137,261 μs | 2611 KB |
#Additional resources
- NuGet package
- Source code on GitHub
- Report an issue or suggest new features
- FileSystemEnumerator<TResult> Class
Do you have a question or a suggestion about this post? Contact me!