Skip to content

Commit e40350f

Browse files
aksOpsclaude
andauthored
feat(supply-chain): production-readiness PR 3 — bundle integrity + secret hygiene + scanner pin (#108)
Third of 5 production-readiness PRs. Closes the air-gap drift, missing bundle integrity, and unpinned scanner-version audit findings. Why --- The bundle deployment model assumes an air-gapped target — but pre-PR-3 the launcher scripts fell back to `curl -fL https://repo1.maven.org/...` when the CLI JAR wasn't bundled, and bundles shipped without any integrity manifest. The `.gitignore` had narrow secret patterns, the `.dockerignore` had no secret patterns at all (and does NOT inherit `.gitignore`), and Semgrep ran unpinned (Scorecard Pinned-Dependencies flag). Changes ------- * **`codeiq bundle` SHA-256 manifest** (`BundleCommand`). Every entry is hashed via streaming `MessageDigest` as it writes through `ZipOutputStream` — no double-read for hundred-MB graph DBs. A final `checksums.sha256` entry in standard GNU coreutils format (`<64-hex> <path>` per line) lets receivers verify with `sha256sum -c checksums.sha256`. The manifest itself is excluded from itself (would be circular); receivers verify `checksums.sha256` integrity out-of-band (Sigstore / GPG / GitHub Release SHA-256). * **No public-internet calls in `serve.sh` / `serve.bat`**. The Maven Central download fallback is removed; both scripts fail fast with a "place the JAR in this directory or re-bundle with --include-jar" message. `serve.sh` automatically runs `sha256sum -c --quiet checksums.sha256` before launch (skip with CODEIQ_SKIP_VERIFY=1 for trusted internal flows). `serve.bat` does not yet have a Windows-native equivalent — tracked. * **Pinned Semgrep version** in `.github/workflows/security.yml`: `pip install semgrep` → `pip install 'semgrep==1.161.0'` (latest stable as of 2026-04-28). Bumps via Dependabot pip ecosystem. * **Tightened secret-pattern exclusions**. - `.gitignore`: `.env` / `.env.local` → `.env.*` (catches `.env.prod`, `.env.test`, ...) plus explicit globs for `*.jks`, `*.p12`, `*.pfx`, `*.keystore`, `id_{rsa,ecdsa,ed25519,dsa}`, `credentials.{json,yaml}`, `secrets.{json,yaml}`, `*.serviceaccount.json`. - `.dockerignore`: mirrors the same rules. Docker resolves COPY against the build context which includes untracked working-tree files; .dockerignore does not inherit .gitignore. * **Bundle verification runbook** in `shared/runbooks/release.md` §4a. Documents consumer-side `sha256sum -c` workflow with CODEIQ_SKIP_VERIFY semantics and the out-of-band signing pattern. Test coverage ------------- * `BundleCommandTest#bundleCreatesZipWithCorrectStructure`: 4 new asserts — `serve.sh` contains no `curl` / `maven.org` (defense against re-introduction), `checksums.sha256` exists, format-conforms to `<64-hex> <path>`, excludes itself. * Full suite: 3672 tests / 0 failures / 0 errors / 32 skipped. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 82ded68 commit e40350f

8 files changed

Lines changed: 287 additions & 36 deletions

File tree

.dockerignore

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,37 @@
1+
# Build artifacts & VCS
12
.git
23
target
34
node_modules
5+
6+
# Documentation (not needed in image)
47
*.md
58
docs/
69
tests/
710
.github/
811
helm/
12+
13+
# Codeiq workspace under src/codeiq/ (development scratchpad)
914
src/codeiq/
15+
16+
# Secrets — explicit defense-in-depth; .dockerignore does NOT inherit
17+
# .gitignore (Docker resolves COPY against the build context, which
18+
# includes uncommitted/working-tree files). Audit RAN-46 §3.
19+
.env
20+
.env.*
21+
*.pem
22+
*.key
23+
*.jks
24+
*.p12
25+
*.pfx
26+
*.keystore
27+
id_rsa
28+
id_ecdsa
29+
id_ed25519
30+
id_dsa
31+
credentials.json
32+
credentials.yaml
33+
secrets.json
34+
secrets.yaml
35+
*.serviceaccount.json
36+
.aws/
37+
.codeiq/

.github/workflows/security.yml

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -90,8 +90,13 @@ jobs:
9090
- uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
9191
with:
9292
python-version: '3.12'
93-
- name: Install semgrep
94-
run: python -m pip install --quiet --upgrade pip semgrep
93+
- name: Install semgrep (pinned for reproducibility)
94+
# Pinned per OpenSSF Scorecard `Pinned-Dependencies` (RAN-46 §5).
95+
# Bump via Dependabot pip ecosystem on a documented cadence; floating
96+
# `semgrep` was previously flagged by Scorecard. pip is left unpinned
97+
# — setup-python@v6 ships a current vendored pip, and the Scorecard
98+
# rule fires only on user-installed packages.
99+
run: python -m pip install --quiet 'semgrep==1.161.0'
95100
- name: Run semgrep (security-audit + owasp-top-ten + java)
96101
run: |
97102
semgrep scan \

.gitignore

Lines changed: 25 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,10 +28,34 @@ Thumbs.db
2828
*.mv.db
2929

3030
# Environment & secrets
31+
# Broad .env* glob catches .env, .env.local, .env.prod, .env.test, .env.* — all
32+
# variants. Pre-PR-3 we only excluded the first two and several .env.<env>
33+
# variants would have committed silently.
3134
.env
32-
.env.local
35+
.env.*
36+
# Java keystores & PKCS#12 archives — high-value secrets that have shown up in
37+
# audits; never commit, even encrypted.
38+
*.jks
39+
*.p12
40+
*.pfx
41+
*.keystore
42+
# Generic credential / private-key patterns
3343
*.pem
3444
*.key
45+
# SSH private keys (public *.pub keys are fine).
46+
id_rsa
47+
id_ecdsa
48+
id_ed25519
49+
id_dsa
50+
# AWS / cloud credentials
51+
.aws/credentials
52+
credentials.json
53+
credentials.yaml
54+
secrets.json
55+
secrets.yaml
56+
# Service-account JSON (GCP / Firebase) — typically named *.serviceaccount.json.
57+
*-serviceaccount.json
58+
*.serviceaccount.json
3559

3660
# Logs
3761
*.log

CHANGELOG.md

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -365,6 +365,54 @@ for that specific tag for the per-commit details.
365365
topology tool as a targeted Cypher query so the snapshot isn't needed.
366366
The cache is the bridge; the rewrite reduces peak memory.
367367

368+
- **Production-readiness PR 3 of 5 — supply chain & bundle integrity.**
369+
Closes the air-gap drift, missing bundle integrity, and unpinned
370+
scanner versions audit findings.
371+
- **`codeiq bundle` SHA-256 manifest.** Every entry in `bundle.zip`
372+
(manifest, scripts, graph DB files, H2 cache, source tree, flow.html,
373+
optional CLI JAR) is now hashed as it streams through the
374+
`ZipOutputStream`, and a `checksums.sha256` entry is written last in
375+
standard GNU coreutils format. Receivers verify with
376+
`sha256sum -c checksums.sha256`. The hash is computed by feeding each
377+
chunk to both the SHA-256 digest and the ZIP stream — no double-read
378+
even for multi-hundred-MB graph databases. Order is deterministic
379+
(sorted dir walks + sorted git ls-files), so the resulting
380+
`checksums.sha256` is byte-stable.
381+
- **No public-internet calls in launcher scripts.** `serve.sh` and
382+
`serve.bat` previously fell back to `curl -fL https://repo1.maven.org/...`
383+
when the CLI JAR wasn't bundled — incompatible with the air-gapped
384+
deploy model documented in `~/.claude/rules/build.md`. The Maven
385+
Central download is removed; if the JAR is missing, the launcher
386+
fails fast and tells the operator to either `--include-jar` when
387+
bundling or stage from an internal artifact mirror. `serve.sh` also
388+
runs `sha256sum -c --quiet checksums.sha256` automatically before
389+
launching (skip with `CODEIQ_SKIP_VERIFY=1`).
390+
- **Pinned Semgrep version.** `.github/workflows/security.yml` was
391+
`pip install semgrep` (floating) — Scorecard's
392+
`Pinned-Dependencies` flagged it. Now pinned to `semgrep==1.161.0`
393+
(latest stable as of 2026-04-28). Bumps go through Dependabot's pip
394+
ecosystem on a documented cadence.
395+
- **Tightened secret-pattern exclusions.** `.gitignore` previously
396+
only matched `.env` / `.env.local` — gaps for `.env.prod`,
397+
`.env.test`, JKS / P12 keystores, SSH private keys, and
398+
cloud-credential JSON. Broadened to `.env.*` plus explicit globs
399+
for `*.jks`, `*.p12`, `*.pfx`, `*.keystore`, `id_{rsa,ecdsa,ed25519,dsa}`,
400+
`credentials.{json,yaml}`, `secrets.{json,yaml}`,
401+
`*.serviceaccount.json`. `.dockerignore` mirrors the same rules
402+
(Docker resolves COPY against the build context, which includes
403+
untracked working-tree files; .dockerignore does not inherit
404+
.gitignore).
405+
- **Bundle verification runbook.** `shared/runbooks/release.md` §4a
406+
documents consumer-side `sha256sum -c` workflow, including the
407+
deliberate exclusion of `checksums.sha256` from itself (would be
408+
circular) and the Sigstore/GPG out-of-band signing that backs
409+
`checksums.sha256` against tampering.
410+
- **Tests:** `BundleCommandTest#bundleCreatesZipWithCorrectStructure`
411+
extended with 4 new asserts: serve.sh contains no `curl`/`maven.org`
412+
references (defense against re-introduction), `checksums.sha256`
413+
exists, format-conforms to `<64-hex> <path>`, and excludes itself.
414+
Full suite: 3672 tests / 0 failures / 0 errors.
415+
368416
## [0.1.0] - 2026-03-28
369417

370418
First general-availability cut. See the

CLAUDE.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -447,6 +447,11 @@ bean for code paths that haven't been ported yet.
447447
- **`Files.probeContentType` is best-effort** — JDK 25 on Linux uses `/etc/mime.types` + magic-byte fallback. It returns `null` if the type can't be determined; treat that as "let it through" (the byte cap in `SafeFileReader` still bounds size). The allowlist for `/api/file` is `text/*` + `application/{json,xml,x-yaml,javascript}` — extending requires adding to the explicit list in `GraphController.readFile`.
448448
- **Sanitize user-controlled values before logging.** `BearerAuthFilter.sanitizeForLog(String)` strips `\p{Cntrl}` and truncates at 256 chars. Use it on anything tainted by `request.getRequestURI()`, `request.getMethod()`, headers, etc. before passing to a logger. CodeQL `java/log-injection` will flag direct `log.warn("... {} ...", request.getRequestURI())` as a vuln.
449449
- **`mcp.limits.max_depth` is a NEW field on `McpLimitsConfig`** (default 10). Audit #10 / C3 — the original audit assumed it existed but it didn't. When adding new MCP traversal tools, cap depth via `Math.min(callerSupplied, maxDepth)` before passing to Cypher. The REST endpoint already had this guard via `config.getMaxDepth()` from `CodeIqConfig`; the MCP path now mirrors it via `McpLimitsConfig.maxDepth()`.
450+
- **`codeiq bundle` writes `checksums.sha256` LAST and excludes itself.** `BundleCommand#writeChecksumsManifest` runs after every other entry has been written, then the digests collected in `LinkedHashMap<String,String> checksums` are emitted as `<sha256> <path>\n` per line — exactly GNU coreutils `sha256sum` format, so receivers verify with `sha256sum -c checksums.sha256`. The manifest itself is intentionally NOT in the digest list (would be circular); to verify `checksums.sha256` against tampering, sign the bundle.zip out-of-band (Sigstore, GPG, or compare to the GitHub Release SHA-256). Don't try to "fix" the circular omission by hashing checksums.sha256 into the manifest — that turns into a cat-and-mouse loop.
451+
- **`writeFileHashed` reads each file once, feeding both the SHA-256 and the ZIP stream.** Hundreds-of-MB graph DBs / CLI JARs can't be double-read for a separate hash pass. The 8KB chunk size in `BundleCommand` is small enough to keep memory flat regardless of file size; do NOT collect bytes into a `byte[]` and then split for "convenience".
452+
- **`serve.sh` and `serve.bat` MUST NOT contain network calls.** Audit RAN-46 §3 — air-gapped deploy model. Pre-PR-3 these scripts had `curl -fL https://repo1.maven.org/...` to download the CLI JAR on first run; that's gone. Receivers must `--include-jar` when bundling or stage the JAR from an internal mirror. There's a regression test in `BundleCommandTest#bundleCreatesZipWithCorrectStructure` that asserts `serve.sh` contains neither `curl` nor `maven.org` — keep that test green.
453+
- **`.dockerignore` does NOT inherit `.gitignore`.** Docker resolves COPY against the build context, which includes uncommitted/untracked working-tree files. `.gitignore` only stops things being staged; it has no effect on what `docker build` sees. Mirror the secret-pattern globs explicitly in `.dockerignore` (`.env*`, `*.jks`, `id_rsa`, `credentials.{json,yaml}`, etc.). Pre-PR-3 the `.dockerignore` was 9 lines and would have shipped a `.env.prod` straight into a published image.
454+
- **Semgrep is pinned to `semgrep==1.161.0`** in `.github/workflows/security.yml`. Bumps go through Dependabot's pip ecosystem on a documented cadence — `pip install --upgrade semgrep` (floating) was previously flagged by Scorecard `Pinned-Dependencies`. Don't unpin to "always get latest"; a CI-time auto-bump on a security-scanner can break the build silently when the new release adds rules.
450455

451456
## Supply-chain observability (OpenSSF)
452457

shared/runbooks/release.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,42 @@ Within 30 minutes of the release workflow finishing:
8080

8181
If any of (1)–(4) fails, [`rollback.md`](rollback.md) applies.
8282

83+
### 4a. Consumer-side bundle integrity (`codeiq bundle` artifacts)
84+
85+
When operators receive a `*-bundle.zip` produced by `codeiq bundle`, they
86+
**must** verify integrity before launching the bundled `serve.sh` /
87+
`serve.bat`. The bundle ships a `checksums.sha256` entry in standard GNU
88+
coreutils format, generated as the last step of bundling
89+
(`BundleCommand#writeChecksumsManifest`).
90+
91+
```bash
92+
# 1. Unzip into a clean directory.
93+
unzip myrepo-v1.0-bundle.zip -d myrepo-bundle/
94+
cd myrepo-bundle
95+
96+
# 2. Verify every file. Exits non-zero if any entry is missing or modified;
97+
# `checksums.sha256` itself is intentionally not listed (would be circular).
98+
sha256sum -c --quiet checksums.sha256
99+
100+
# 3. (Optional) Skip via env var only when the bundle is trusted source-internal:
101+
# CODEIQ_SKIP_VERIFY=1 ./serve.sh
102+
./serve.sh
103+
```
104+
105+
`serve.sh` runs the same `sha256sum -c` automatically when the binary is
106+
on `PATH`. **Do not set `CODEIQ_SKIP_VERIFY=1` in production**: it
107+
disables the only consumer-side integrity gate when the bundle was
108+
delivered out-of-band (USB, internal mirror, AKS sidecar artifact). For
109+
verifying `checksums.sha256` itself against tampering, sign the
110+
bundle.zip out-of-band (Sigstore, GPG, or compare to the GitHub Release
111+
SHA-256 if the bundle was published to a release).
112+
113+
If the consumer environment does not provide `sha256sum` (Windows without
114+
WSL, locked-down build agents), distribute the bundle via Sigstore-signed
115+
release and rely on the Sigstore client for integrity. `serve.bat`
116+
intentionally does **not** include a Windows-native verification step
117+
yet — tracked under follow-up.
118+
83119
---
84120

85121
## 5. Hot-fix patch release (`X.Y.Z+1`)

0 commit comments

Comments
 (0)