You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Third of 5 production-readiness PRs. Closes the air-gap drift, missing
bundle integrity, and unpinned scanner-version audit findings.
Why
---
The bundle deployment model assumes an air-gapped target — but pre-PR-3
the launcher scripts fell back to `curl -fL https://repo1.maven.org/...`
when the CLI JAR wasn't bundled, and bundles shipped without any
integrity manifest. The `.gitignore` had narrow secret patterns, the
`.dockerignore` had no secret patterns at all (and does NOT inherit
`.gitignore`), and Semgrep ran unpinned (Scorecard Pinned-Dependencies
flag).
Changes
-------
* **`codeiq bundle` SHA-256 manifest** (`BundleCommand`). Every entry
is hashed via streaming `MessageDigest` as it writes through
`ZipOutputStream` — no double-read for hundred-MB graph DBs. A
final `checksums.sha256` entry in standard GNU coreutils format
(`<64-hex> <path>` per line) lets receivers verify with
`sha256sum -c checksums.sha256`. The manifest itself is excluded
from itself (would be circular); receivers verify
`checksums.sha256` integrity out-of-band (Sigstore / GPG / GitHub
Release SHA-256).
* **No public-internet calls in `serve.sh` / `serve.bat`**. The Maven
Central download fallback is removed; both scripts fail fast with
a "place the JAR in this directory or re-bundle with --include-jar"
message. `serve.sh` automatically runs `sha256sum -c --quiet
checksums.sha256` before launch (skip with CODEIQ_SKIP_VERIFY=1
for trusted internal flows). `serve.bat` does not yet have a
Windows-native equivalent — tracked.
* **Pinned Semgrep version** in `.github/workflows/security.yml`:
`pip install semgrep` → `pip install 'semgrep==1.161.0'` (latest
stable as of 2026-04-28). Bumps via Dependabot pip ecosystem.
* **Tightened secret-pattern exclusions**.
- `.gitignore`: `.env` / `.env.local` → `.env.*` (catches
`.env.prod`, `.env.test`, ...) plus explicit globs for `*.jks`,
`*.p12`, `*.pfx`, `*.keystore`, `id_{rsa,ecdsa,ed25519,dsa}`,
`credentials.{json,yaml}`, `secrets.{json,yaml}`,
`*.serviceaccount.json`.
- `.dockerignore`: mirrors the same rules. Docker resolves COPY
against the build context which includes untracked working-tree
files; .dockerignore does not inherit .gitignore.
* **Bundle verification runbook** in
`shared/runbooks/release.md` §4a. Documents consumer-side
`sha256sum -c` workflow with CODEIQ_SKIP_VERIFY semantics and the
out-of-band signing pattern.
Test coverage
-------------
* `BundleCommandTest#bundleCreatesZipWithCorrectStructure`: 4 new
asserts — `serve.sh` contains no `curl` / `maven.org` (defense
against re-introduction), `checksums.sha256` exists,
format-conforms to `<64-hex> <path>`, excludes itself.
* Full suite: 3672 tests / 0 failures / 0 errors / 32 skipped.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: CLAUDE.md
+5Lines changed: 5 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -447,6 +447,11 @@ bean for code paths that haven't been ported yet.
447
447
-**`Files.probeContentType` is best-effort** — JDK 25 on Linux uses `/etc/mime.types` + magic-byte fallback. It returns `null` if the type can't be determined; treat that as "let it through" (the byte cap in `SafeFileReader` still bounds size). The allowlist for `/api/file` is `text/*` + `application/{json,xml,x-yaml,javascript}` — extending requires adding to the explicit list in `GraphController.readFile`.
448
448
-**Sanitize user-controlled values before logging.**`BearerAuthFilter.sanitizeForLog(String)` strips `\p{Cntrl}` and truncates at 256 chars. Use it on anything tainted by `request.getRequestURI()`, `request.getMethod()`, headers, etc. before passing to a logger. CodeQL `java/log-injection` will flag direct `log.warn("... {} ...", request.getRequestURI())` as a vuln.
449
449
-**`mcp.limits.max_depth` is a NEW field on `McpLimitsConfig`** (default 10). Audit #10 / C3 — the original audit assumed it existed but it didn't. When adding new MCP traversal tools, cap depth via `Math.min(callerSupplied, maxDepth)` before passing to Cypher. The REST endpoint already had this guard via `config.getMaxDepth()` from `CodeIqConfig`; the MCP path now mirrors it via `McpLimitsConfig.maxDepth()`.
450
+
-**`codeiq bundle` writes `checksums.sha256` LAST and excludes itself.**`BundleCommand#writeChecksumsManifest` runs after every other entry has been written, then the digests collected in `LinkedHashMap<String,String> checksums` are emitted as `<sha256> <path>\n` per line — exactly GNU coreutils `sha256sum` format, so receivers verify with `sha256sum -c checksums.sha256`. The manifest itself is intentionally NOT in the digest list (would be circular); to verify `checksums.sha256` against tampering, sign the bundle.zip out-of-band (Sigstore, GPG, or compare to the GitHub Release SHA-256). Don't try to "fix" the circular omission by hashing checksums.sha256 into the manifest — that turns into a cat-and-mouse loop.
451
+
-**`writeFileHashed` reads each file once, feeding both the SHA-256 and the ZIP stream.** Hundreds-of-MB graph DBs / CLI JARs can't be double-read for a separate hash pass. The 8KB chunk size in `BundleCommand` is small enough to keep memory flat regardless of file size; do NOT collect bytes into a `byte[]` and then split for "convenience".
452
+
-**`serve.sh` and `serve.bat` MUST NOT contain network calls.** Audit RAN-46 §3 — air-gapped deploy model. Pre-PR-3 these scripts had `curl -fL https://repo1.maven.org/...` to download the CLI JAR on first run; that's gone. Receivers must `--include-jar` when bundling or stage the JAR from an internal mirror. There's a regression test in `BundleCommandTest#bundleCreatesZipWithCorrectStructure` that asserts `serve.sh` contains neither `curl` nor `maven.org` — keep that test green.
453
+
-**`.dockerignore` does NOT inherit `.gitignore`.** Docker resolves COPY against the build context, which includes uncommitted/untracked working-tree files. `.gitignore` only stops things being staged; it has no effect on what `docker build` sees. Mirror the secret-pattern globs explicitly in `.dockerignore` (`.env*`, `*.jks`, `id_rsa`, `credentials.{json,yaml}`, etc.). Pre-PR-3 the `.dockerignore` was 9 lines and would have shipped a `.env.prod` straight into a published image.
454
+
-**Semgrep is pinned to `semgrep==1.161.0`** in `.github/workflows/security.yml`. Bumps go through Dependabot's pip ecosystem on a documented cadence — `pip install --upgrade semgrep` (floating) was previously flagged by Scorecard `Pinned-Dependencies`. Don't unpin to "always get latest"; a CI-time auto-bump on a security-scanner can break the build silently when the new release adds rules.
0 commit comments