Description
System.Formats.Tar.TarReader does not handle GNU sparse format 1.0 entries encoded via PAX extended attributes. When reading such entries, TarEntry.Name returns the internal placeholder path (containing GNUSparseFile.0) instead of the real file name, and TarEntry.Length returns the stored (sparse) size rather than the real file size.
GNU sparse format 1.0 stores the real name and size in PAX extended attributes:
GNU.sparse.name — the real file path
GNU.sparse.realsize — the real file size
TarHeader.ReplaceNormalAttributesWithExtended() processes standard PAX attributes like path, size, mtime, etc., but does not process GNU.sparse.name or GNU.sparse.realsize.
How this occurs in practice
macOS ships bsdtar (libarchive), which detects sparse files by default during archive creation. .NET DLLs on APFS have zero-filled PE alignment sections that APFS stores as filesystem holes, causing bsdtar to treat them as sparse and encode them with the GNU sparse PAX format.
The tar command producing the affected archive was:
tar -cf - . | pigz > output.tar.gz
When .NET's TarReader reads these archives, ~46% of entries have incorrect names containing GNUSparseFile.0.
Reproduction Steps
Option 1 — With an affected tar.gz file
Download an affected tarball (a .NET SDK built on macOS):
dotnet-sdk-11.0.100-ci-osx-x64.tar.gz
Then run the repro program (below) against it.
Option 2 — Create a sparse tar.gz on macOS
On a Mac, create a sparse file and archive it:
# Create a file with sparse holes
dd if=/dev/zero of=sparse.bin bs=1 count=0 seek=1048576
echo "hello" >> sparse.bin
# Archive it (bsdtar detects sparse by default)
tar -czf sparse.tar.gz sparse.bin
Then read it on any platform with the repro program below.
Repro Program
Program.cs:
using System.Formats.Tar;
using System.IO.Compression;
if (args.Length == 0)
{
Console.Error.WriteLine("Usage: dotnet run -- <path-to-tarball.tar.gz>");
return 1;
}
string path = args[0];
if (!File.Exists(path))
{
Console.Error.WriteLine($"File not found: {path}");
return 1;
}
Console.WriteLine($"Reading: {path}");
Console.WriteLine();
int totalEntries = 0;
int sparseEntries = 0;
using FileStream fs = File.OpenRead(path);
using GZipStream gz = new(fs, CompressionMode.Decompress);
using TarReader reader = new(gz);
while (reader.GetNextEntry() is TarEntry entry)
{
totalEntries++;
if (entry is PaxTarEntry pax
&& pax.ExtendedAttributes.TryGetValue("GNU.sparse.name", out string? realName))
{
sparseEntries++;
if (sparseEntries <= 5)
{
Console.WriteLine($"Entry #{totalEntries}:");
Console.WriteLine($" entry.Name (WRONG): {entry.Name}");
Console.WriteLine($" GNU.sparse.name : {realName}");
if (pax.ExtendedAttributes.TryGetValue("GNU.sparse.realsize", out string? realSize))
{
Console.WriteLine($" entry.Length : {entry.Length}");
Console.WriteLine($" GNU.sparse.realsize: {realSize}");
}
Console.WriteLine();
}
}
}
Console.WriteLine($"Total entries : {totalEntries}");
Console.WriteLine($"Sparse entries: {sparseEntries}");
if (sparseEntries > 0)
{
Console.WriteLine();
Console.WriteLine("BUG: TarReader exposes internal 'GNUSparseFile.0' placeholder paths");
Console.WriteLine(" instead of using the real name from GNU.sparse.name.");
}
return sparseEntries > 0 ? 1 : 0;
tar-repro.csproj:
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>net9.0</TargetFramework>
<ImplicitUsings>enable</ImplicitUsings>
<Nullable>enable</Nullable>
</PropertyGroup>
</Project>
Expected behavior
For entries with GNU.sparse.name and GNU.sparse.realsize PAX extended attributes:
entry.Name should return the value of GNU.sparse.name (e.g., ./shared/Microsoft.NETCore.App/11.0.0-ci/Microsoft.CSharp.dll)
entry.Length should return the value of GNU.sparse.realsize (e.g., 1115136)
Actual behavior
entry.Name returns the internal placeholder path (e.g., ./shared/Microsoft.NETCore.App/11.0.0-ci/GNUSparseFile.0/Microsoft.CSharp.dll)
entry.Length returns the stored/sparse size (e.g., 791040)
Example output from the repro against the linked tarball:
Reading: dotnet-sdk-11.0.100-ci-osx-x64.tar.gz
Entry #9:
entry.Name (WRONG): ./shared/Microsoft.NETCore.App/11.0.0-ci/GNUSparseFile.0/Microsoft.CSharp.dll
GNU.sparse.name : ./shared/Microsoft.NETCore.App/11.0.0-ci/Microsoft.CSharp.dll
entry.Length : 791040
GNU.sparse.realsize: 1115136
Total entries : 199
Sparse entries: 91
BUG: TarReader exposes internal 'GNUSparseFile.0' placeholder paths
instead of using the real name from GNU.sparse.name.
Suggested Fix
In TarHeader.ReplaceNormalAttributesWithExtended(), add handling for the GNU sparse PAX attributes after the existing standard attribute processing:
// GNU sparse format 1.0 stores the real name and size in extended attributes.
// The header's name field contains an internal placeholder like "GNUSparseFile.0/...".
if (ExtendedAttributes.TryGetValue("GNU.sparse.name", out string? gnuSparseName))
{
_name = gnuSparseName;
}
if (TarHelpers.TryGetStringAsBaseTenLong(ExtendedAttributes, "GNU.sparse.realsize", out long gnuSparseRealSize))
{
_size = gnuSparseRealSize;
}
Configuration
- Affects all .NET versions with
System.Formats.Tar (net7.0+)
- All platforms when reading archives created on macOS (or any system using bsdtar/libarchive with sparse detection)
- The archive creation side can work around this with
tar --no-read-sparse, but TarReader should handle this format correctly regardless
Impact
This is a real-world issue affecting .NET CI/CD infrastructure. Archives produced by macOS build agents contain GNU sparse PAX entries for .NET DLLs, and downstream tools using TarReader to process these archives (e.g., for code signing) encounter incorrect paths, leading to build failures.
Description
System.Formats.Tar.TarReaderdoes not handle GNU sparse format 1.0 entries encoded via PAX extended attributes. When reading such entries,TarEntry.Namereturns the internal placeholder path (containingGNUSparseFile.0) instead of the real file name, andTarEntry.Lengthreturns the stored (sparse) size rather than the real file size.GNU sparse format 1.0 stores the real name and size in PAX extended attributes:
GNU.sparse.name— the real file pathGNU.sparse.realsize— the real file sizeTarHeader.ReplaceNormalAttributesWithExtended()processes standard PAX attributes likepath,size,mtime, etc., but does not processGNU.sparse.nameorGNU.sparse.realsize.How this occurs in practice
macOS ships bsdtar (libarchive), which detects sparse files by default during archive creation. .NET DLLs on APFS have zero-filled PE alignment sections that APFS stores as filesystem holes, causing bsdtar to treat them as sparse and encode them with the GNU sparse PAX format.
The tar command producing the affected archive was:
When .NET's
TarReaderreads these archives, ~46% of entries have incorrect names containingGNUSparseFile.0.Reproduction Steps
Option 1 — With an affected tar.gz file
Download an affected tarball (a .NET SDK built on macOS):
dotnet-sdk-11.0.100-ci-osx-x64.tar.gz
Then run the repro program (below) against it.
Option 2 — Create a sparse tar.gz on macOS
On a Mac, create a sparse file and archive it:
Then read it on any platform with the repro program below.
Repro Program
Program.cs:
tar-repro.csproj:
Expected behavior
For entries with
GNU.sparse.nameandGNU.sparse.realsizePAX extended attributes:entry.Nameshould return the value ofGNU.sparse.name(e.g.,./shared/Microsoft.NETCore.App/11.0.0-ci/Microsoft.CSharp.dll)entry.Lengthshould return the value ofGNU.sparse.realsize(e.g.,1115136)Actual behavior
entry.Namereturns the internal placeholder path (e.g.,./shared/Microsoft.NETCore.App/11.0.0-ci/GNUSparseFile.0/Microsoft.CSharp.dll)entry.Lengthreturns the stored/sparse size (e.g.,791040)Example output from the repro against the linked tarball:
Suggested Fix
In
TarHeader.ReplaceNormalAttributesWithExtended(), add handling for the GNU sparse PAX attributes after the existing standard attribute processing:Configuration
System.Formats.Tar(net7.0+)tar --no-read-sparse, but TarReader should handle this format correctly regardlessImpact
This is a real-world issue affecting .NET CI/CD infrastructure. Archives produced by macOS build agents contain GNU sparse PAX entries for .NET DLLs, and downstream tools using
TarReaderto process these archives (e.g., for code signing) encounter incorrect paths, leading to build failures.