Integration Tests Coverage Guide

A reference guide to what the gh-aw-firewall integration tests cover and how they relate to real-world usage in GitHub Agentic Workflows.

Last updated: February 2026

Quick Navigation

Area	Tests	Doc
Domain filtering, DNS, network security	6 files, ~50 tests	domain-network.md
Chroot sandbox, languages, package managers	5 files, ~70 tests	chroot.md
Protocol support, credentials, tokens	8 files, ~100 tests	protocol-security.md
Containers, volumes, git, env vars	7 files, ~45 tests	container-ops.md
CI workflows, smoke tests, build-test	27 workflows	ci-smoke.md
Test fixtures and infrastructure	6 helper files	test-infra.md

Overview

The test suite is organized in three tiers:

┌─────────────────────────────────────────────────────┐
│  Smoke Tests (4 workflows)                          │
│  Smoke workflows (Claude, Copilot, Codex, Chroot)   │
│  running inside AWF sandbox                         │
├─────────────────────────────────────────────────────┤
│  Build-Test Workflows (8 workflows)                 │
│  Real projects (Go, Rust, Java, Node, etc.)         │
│  built and tested through the firewall proxy        │
├─────────────────────────────────────────────────────┤
│  Integration Tests (26 files, ~265 tests)           │
│  End-to-end AWF container execution with            │
│  domain filtering, chroot, security assertions      │
├─────────────────────────────────────────────────────┤
│  Unit Tests (19 files)                              │
│  Individual module testing (parser, config, logger)  │
└─────────────────────────────────────────────────────┘

Test Counts by Category

Category	Files	Approx Tests	CI Workflow
Domain/Network	6	50	None
Chroot	5	70	`test-chroot.yml` (4 jobs)
Protocol/Security	8	100	None
Container/Ops	7	45	None
Unit Tests	19	~200	`test-coverage.yml`
Smoke Tests	4	N/A	Per-workflow (scheduled + PR)
Build-Test	8	N/A	Per-workflow (PR + dispatch)

What's Covered

1. Chroot Filesystem Isolation (Strong)

The chroot tests are the most mature, run in CI, and cover critical scenarios:

Language runtimes: Python, Node.js, Go, Java, .NET, Ruby, Rust all verified accessible through chroot
Package managers: pip, npm, cargo, maven, dotnet, gem, go modules — all tested for registry connectivity
Security properties: NET_ADMIN/SYS_CHROOT capability drop, Docker socket hidden, non-root execution
/proc filesystem: Dynamic mount verified for JVM and .NET CLR compatibility
Shell features: Pipes, redirects, command substitution, compound commands all work in chroot

CI coverage: 4 parallel jobs in test-chroot.yml exercise these tests on every PR.

2. Credential Isolation (Strong)

Multi-layered defense tested at each level:

Credential file hiding: Docker config, GitHub CLI tokens, npmrc auth tokens all verified hidden via /dev/null overlays
Exfiltration resistance: base64 encoding, xxd pipelines, grep patterns all tested — return empty
Chroot bypass prevention: Specific regression test for the vulnerability where credentials were accessible at $HOME but not /host$HOME
API proxy sidecar: Agent gets placeholder tokens; real keys held by proxy. Healthchecks for OpenAI, Anthropic, Copilot
One-shot token library: LD_PRELOAD intercepts getenv(), caches value, clears from environment. Tested in both container and chroot modes
Token unsetting from /proc/1/environ: GITHUB_TOKEN, OPENAI_API_KEY, ANTHROPIC_API_KEY all verified cleared

3. Multi-Engine Smoke Tests (Strong)

Real AI agents running through the full AWF pipeline:

Claude: GitHub MCP, Playwright browser automation, file I/O, bash tools
Copilot: Same + web-fetch, agentic-workflows tools
Codex: GH CLI safe inputs, Tavily web search, discussion interactions

4. Multi-Language Build-Test (Strong)

8 language ecosystems tested with real open-source projects:

Bun, C++, Deno, .NET, Go, Java, Node.js, Rust
Each clones a test repo, installs dependencies, builds, and runs tests through AWF

5. Exit Code Propagation (Good)

15 tests covering exit codes 0-255, command exit codes, pipeline behavior. Critical for CI/CD integration where non-zero = failure.

Coverage Heat Map

A visual overview of what's tested vs. not:

Feature                          Unit  Integration  CI   Smoke  Build-Test
─────────────────────────────────────────────────────────────────────────
Domain allow-list                 ✅      ✅         ❌    ✅      ✅
Domain deny-list (--block-domains) ❌      ❌         ❌    ❌      ❌
Wildcard patterns                 ✅      ✅         ❌    ❌      ❌
Empty domains (air-gapped)        ❌      ✅         ❌    ❌      ❌
DNS server restriction            ✅      ⚠️ *       ❌    ❌      ❌
Network security (SSRF, bypass)   ❌      ✅         ❌    ❌      ❌
Chroot languages                  ❌      ✅         ✅    ✅      ✅
Chroot package managers           ❌      ✅         ✅    ❌      ✅
Chroot /proc filesystem           ❌      ✅         ✅    ❌      ❌
Chroot edge cases                 ❌      ✅         ✅    ❌      ❌
Credential hiding                 ❌      ✅         ❌    ❌      ❌
Token unsetting                   ❌      ✅         ❌    ❌      ❌
One-shot tokens (LD_PRELOAD)      ❌      ✅         ❌    ❌      ❌
API proxy sidecar                 ❌      ✅         ❌    ❌      ❌
Protocol support (HTTP/HTTPS)     ❌      ✅         ❌    ❌      ❌
IPv6                              ❌      ✅         ❌    ❌      ❌
Exit code propagation             ❌      ✅         ❌    ❌      ❌
Error handling                    ❌      ✅         ❌    ❌      ❌
Volume mounts                     ❌      ✅         ❌    ❌      ❌
Container workdir                 ❌      ✅         ❌    ❌      ❌
Git operations                    ❌      ✅         ❌    ❌      ❌
Environment variables             ❌      ✅         ❌    ❌      ❌
--env-all                         ❌      ❌         ❌    ❌      ❌
SSL Bump                          ✅      ❌         ❌    ❌      ❌
Log commands                      ✅      ⚠️ *       ❌    ❌      ❌
Docker unavailability             ❌      ✅         ❌    ❌      ❌
Docker warning stub               ❌      ❌ **      ❌    ❌      ❌
Setup action (action.yml)         ❌      ❌         ✅    ❌      ❌
Container security scan           ❌      ❌         ✅    ❌      ❌
Dependency audit                  ❌      ❌         ✅    ❌      ❌

* ⚠️ = Tests exist but have significant gaps (see detailed docs)
** = Tests exist but are skipped

Test Infrastructure Summary

How Tests Run

Serial execution (maxWorkers: 1) — Docker network/container conflicts prevent parallelism
120-second timeout per test — container lifecycle takes 15-25 seconds
Batch runner groups commands sharing the same config into single containers — reduces ~73 startups to ~27 for chroot tests
Custom Jest matchers: toSucceed(), toFail(), toExitWithCode(), toTimeout(), toAllowDomain(), toBlockDomain()
4-stage cleanup: pre-test TypeScript cleanup → AWF normal exit → AWF signal handlers → CI always-cleanup

Infrastructure Limitations

Docker + sudo required — no lightweight local testing
Batch runner loses individual stderr (merged via 2>&1)
Log-based matchers require keepContainers: true
Aggressive docker prune in cleanup can affect non-AWF containers
No retry logic for flaky network tests

See test-infra.md for full infrastructure analysis.

Detailed Analysis Documents

Each document provides per-test-case analysis with plain-language descriptions, real-world mappings, and gap identification:

Domain & Network Tests — Domain filtering, DNS, network security, localhost
Chroot Tests — Sandbox isolation, languages, package managers, /proc, edge cases
Protocol & Security Tests — HTTP/HTTPS, IPv6, API proxy, credentials, tokens, exit codes
Container & Operations Tests — Workdir, volumes, git, env vars, logging, Docker availability
CI & Smoke Tests — All 27 CI/smoke/build-test workflows analyzed
Test Infrastructure — Runner architecture, batch pattern, cleanup strategy, limitations

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integration Tests Coverage Guide

Quick Navigation

Overview

Test Counts by Category

What's Covered

1. Chroot Filesystem Isolation (Strong)

2. Credential Isolation (Strong)

3. Multi-Engine Smoke Tests (Strong)

4. Multi-Language Build-Test (Strong)

5. Exit Code Propagation (Good)

Coverage Heat Map

Test Infrastructure Summary

How Tests Run

Infrastructure Limitations

Detailed Analysis Documents

FilesExpand file tree

INTEGRATION-TESTS.md

Latest commit

History

INTEGRATION-TESTS.md

File metadata and controls

Integration Tests Coverage Guide

Quick Navigation

Overview

Test Counts by Category

What's Covered

1. Chroot Filesystem Isolation (Strong)

2. Credential Isolation (Strong)

3. Multi-Engine Smoke Tests (Strong)

4. Multi-Language Build-Test (Strong)

5. Exit Code Propagation (Good)

Coverage Heat Map

Test Infrastructure Summary

How Tests Run

Infrastructure Limitations

Detailed Analysis Documents