Skip to content

kim-em/lean-zip

Repository files navigation

lean-zip

Lean 4 bindings for zlib and Zstandard compression, plus tar and ZIP archive support.

Provides whole-buffer and streaming APIs for zlib, gzip, raw deflate, and Zstandard formats, CRC32/Adler32 checksums, tar archives (.tar and .tar.gz with PAX/GNU extension support), and ZIP archives (with ZIP64 support).

Requirements

  • Lean 4 (tested with v4.20.0 through v4.29.0-rc1)
  • zlib development headers (zlib-dev, zlib1g-dev, or equivalent)
  • libzstd development headers (libzstd-dev or equivalent)
  • pkg-config (for header discovery on NixOS and similar)

On NixOS (or any system where zlib/zstd aren't in the default library path), the project includes a shell.nix that provides all C dependencies:

nix-shell    # then run lake build, lake exe test, etc. inside the shell

Or use direnv for automatic environment activation:

direnv allow   # one-time setup; environment then activates on cd

You can also set ZLIB_CFLAGS and ZSTD_CFLAGS manually to point at the headers if you prefer.

Usage

Add to your lakefile.lean:

require "kim-em" / "lean-zip"

Compression

import Zip

-- Zlib format
let compressed ← Zlib.compress data
let original ← Zlib.decompress compressed

-- Gzip format (compatible with gzip/gunzip)
let gzipped ← Gzip.compress data (level := 6)
let original ← Gzip.decompress gzipped

-- Raw deflate (no header/trailer, used internally by ZIP)
let deflated ← RawDeflate.compress data
let original ← RawDeflate.decompress deflated

-- Zstandard (modern, fast, excellent compression ratio)
let compressed ← Zstd.compress data (level := 3)
let original ← Zstd.decompress compressed

Streaming

For data too large to fit in memory:

-- Stream between IO.FS.Streams (64KB chunks, bounded memory)
Gzip.compressStream inputStream outputStream (level := 6)
Gzip.decompressStream inputStream outputStream

-- Zstd streaming
Zstd.compressStream inputStream outputStream (level := 3)
Zstd.decompressStream inputStream outputStream

-- File helpers
let gzPath ← Gzip.compressFile "/path/to/file"         -- writes /path/to/file.gz
let outPath ← Gzip.decompressFile "/path/to/file.gz"   -- writes /path/to/file

Zstd.compressFile "/path/to/file"                       -- writes /path/to/file.zst
Zstd.decompressFile "/path/to/file.zst"                 -- writes /path/to/file

Low-level streaming state

let state ← Gzip.DeflateState.new (level := 6)
let compressed ← state.push chunk1
let compressed2 ← state.push chunk2
let final ← state.finish  -- must call exactly once

let state ← Zstd.CompressState.new (level := 3)
let compressed ← state.push chunk
let final ← state.finish

Checksums

let crc ← Checksum.crc32 0 data         -- CRC-32
let adler ← Checksum.adler32 1 data     -- Adler-32
-- Incremental: pass previous result as init
let crc2 ← Checksum.crc32 crc moreData

Tar archives

-- Create .tar.gz from a directory (streaming, bounded memory)
Tar.createTarGz "/tmp/archive.tar.gz" "/path/to/dir"

-- Extract .tar.gz
Tar.extractTarGz "/tmp/archive.tar.gz" "/tmp/output"

-- Create/extract raw .tar via IO.FS.Stream
Tar.createFromDir stream dir
Tar.extract stream outDir

-- List entries without extracting
let entries ← Tar.list stream

Tar supports UStar, PAX extended headers (for long paths, large files, UTF-8), and GNU long name/link extensions. Paths exceeding UStar limits are automatically encoded with PAX headers on creation.

ZIP archives

-- Create from explicit file list
Archive.create "/tmp/archive.zip" #[
  ("name-in-zip.txt", "/path/on/disk.txt"),
  ("subdir/file.bin", "/other/file.bin")
]

-- Create from directory
Archive.createFromDir "/tmp/archive.zip" "/path/to/dir"

-- Extract all files
Archive.extract "/tmp/archive.zip" "/tmp/output"

-- Extract a single file by name
let data ← Archive.extractFile "/tmp/archive.zip" "name-in-zip.txt"

-- List entries
let entries ← Archive.list "/tmp/archive.zip"

ZIP supports stored (method 0) and deflated (method 8) entries with automatic method selection, CRC32 verification, and ZIP64 extensions for archives exceeding 4GB or 65535 entries.

Building

lake build

Testing

lake build test && .lake/build/bin/test

Benchmarking

A benchmark driver is included for use with hyperfine:

lake build bench

# Single benchmark
hyperfine 'lake exe bench inflate 1048576 prng 6'

# Compare patterns
hyperfine 'lake exe bench inflate 1048576 constant 6' \
          'lake exe bench inflate 1048576 prng 6'

# Sweep sizes
hyperfine --parameter-list size 1024,65536,1048576 \
          'lake exe bench inflate {size} prng 6'

Operations: inflate, gzip, zlib, crc32, adler32, crc32-ffi, adler32-ffi. Patterns: constant, cyclic, prng. Level: 0-9 (default 6).

Known Limitations

  • TOCTOU in extraction: archive extraction creates parent directories then writes files; a local attacker could replace a directory with a symlink between these steps. Fixing this requires openat()/O_NOFOLLOW, which Lean's stdlib doesn't expose.
  • No streaming output limit: whole-buffer decompression accepts a max_output cap, but the streaming API (push/finish) has no built-in limit — callers processing untrusted data should track total output themselves.

License

Apache-2.0. See LICENSE.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors