Skip to content

Commit b1872f4

Browse files
committed
checkpoint: pre-yolo 2026-04-28T07:25:13
1 parent 62afaa8 commit b1872f4

3 files changed

Lines changed: 334 additions & 1 deletion

File tree

AGENTS.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ If you hit something requiring GitHub App / PAT / OAuth that the runtime cannot
4646
<claude-mem-context>
4747
# Memory Context
4848

49-
# [codeiq] recent context, 2026-04-28 1:14am UTC
49+
# [codeiq] recent context, 2026-04-28 6:43am UTC
5050

5151
No previous sessions found.
5252
</claude-mem-context>
Lines changed: 156 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,156 @@
1+
# Counter-Audit: serve-path Production Readiness
2+
**Original audit:** `docs/audits/2026-04-28-serve-path-prod-readiness.md` (15 findings: 6 HIGH / 7 MEDIUM / 2 LOW)
3+
**Counter-audit date:** 2026-04-28
4+
**Method:** Every finding verified against actual source files. Net-new findings added from independent inspection of GraphController, McpTools, ServeCommand, GraphHealthIndicator, CorsConfig, BundleCommand, SafeFileReader, CodeIqConfig, McpLimitsConfig, McpAuthConfig, GraphStore, application.yml, pom.xml, security.yml, ci-java.yml, and frontend/package.json.
5+
6+
---
7+
8+
## Section A — Corrections to Original Audit
9+
10+
### A1. Finding #6 overstated — `markReady()` fires AFTER `bootstrapNeo4jFromCache()`, not before
11+
12+
**Original claim:** "`markReady()` fires before the graph is loaded" — traffic is routed during bootstrap.
13+
14+
**What the code actually does:** `ServeCommand.java` lines 83–126: `graphBootstrapper.bootstrapNeo4jFromCache()` is called at line 84 and returns only when bootstrap completes. `markReady()` is called at line 126, after that return and after the node/edge count is printed to stdout. The auto-bootstrap race window the audit describes does not exist in the current code.
15+
16+
**What IS real:** `GraphHealthIndicator` is not wired into the readiness group in `application.yml` (no `management.endpoint.health.group.readiness.include: readinessState,graph`). So the Kubernetes readiness probe does not gate on graph health. If a future code change moves `markReady()` earlier, or if bootstrap is made async, this becomes acute. The config gap is the real finding.
17+
18+
**Correction:** Downgrade finding #6 from HIGH to MEDIUM. The fix (adding `management.endpoint.health.group.readiness.include: readinessState,graph` to the serving profile in `application.yml`) is still valid and still needed, but the acute rollout-during-bootstrap race does not currently exist.
19+
20+
---
21+
22+
### A2. Finding #10 is half-wrong — REST `traceImpact` IS depth-capped; MCP is not
23+
24+
**Original claim:** "the API endpoint `/api/triage/impact/{id}` (`GraphController:188`) doesn't appear to bound it."
25+
26+
**What the code actually does:** `GraphController.java` line 192:
27+
```java
28+
int cappedDepth = Math.min(depth, config.getMaxDepth());
29+
```
30+
`config.getMaxDepth()` defaults to 10 (`CodeIqConfig.java`). The REST endpoint is bounded.
31+
32+
**What IS true — in the other direction:** `McpTools.traceImpact` passes `depth != null ? depth : 3` directly to `queryService.traceImpact` with no `Math.min` guard. The MCP path is unbounded.
33+
34+
**Correction:** The unbounded-depth defect exists on the MCP tool, not the REST controller. The fix targets `McpTools.traceImpact`, not `GraphController`. Severity (MEDIUM) is unchanged; affected surface is corrected. See also C3 below.
35+
36+
---
37+
38+
### A3. Finding #11 is partially wrong — `SafeFileReader` does enforce a byte cap
39+
40+
**Original claim:** "no `Content-Length` cap matches `getMaxFileBytes`" — implies size is unbounded.
41+
42+
**What the code actually does:** `GraphController.readFile` calls `SafeFileReader.read(resolved, startLine, endLine, config.getMaxFileBytes())`. `SafeFileReader` enforces the byte cap for both full-file and line-range reads. The cap is real and working.
43+
44+
**What IS real:** Content-type is not sniffed. Binary files (`.jks`, `.so`, `.png`) are served as `text/plain` with no early reject. The slow-client connection exhaustion concern is valid. The size-cap concern is not.
45+
46+
**Correction:** Remove "no Content-Length cap" from finding #11. The binary content-type concern stands; severity (MEDIUM) is unchanged.
47+
48+
---
49+
50+
## Section B — Severity Adjustments
51+
52+
### B1. Finding #6 — downgrade from HIGH to MEDIUM (per A1)
53+
54+
The bootstrap-before-markReady ordering eliminates the acute readiness race. The residual gap — readiness group not configured — is MEDIUM: probes do not gate on graph health, which can cause transient 503s after a pod restart if Neo4j is slow to open, but the bootstrap path completes before traffic is accepted under normal conditions.
55+
56+
---
57+
58+
### B2. Finding #4 — scope extended; severity remains HIGH
59+
60+
Finding #4 correctly flags null checksums in `BundleCommand.createManifest` (line 241: `null` passed as the checksums argument). This is confirmed. However the finding misses a co-equal defect: `generateServeShell` (lines 265–274 in `BundleCommand.java`) emits a `serve.sh` that unconditionally downloads the JAR at runtime:
61+
62+
```bash
63+
curl -fL -o "$JAR" \
64+
"https://repo1.maven.org/maven2/io/github/randomcodespace/iq/code-iq/${VERSION}/code-iq-${VERSION}-cli.jar"
65+
```
66+
67+
An equivalent `serve.bat` exists for Windows. This is a direct violation of `build.md` ("No runtime network calls to the public internet"). Bundles deployed in air-gapped environments silently fail to start when the JAR is absent. A compromised Maven Central namespace could substitute a malicious JAR, and even correct checksums in `manifest.json` would not protect against it because the downloaded JAR is not verified against them. The fix for finding #4 must address both the null checksums **and** the runtime download.
68+
69+
---
70+
71+
## Section C — Missed Findings
72+
73+
### C1. HIGH — `getCachedData()` loads the full graph into heap on every topology MCP tool call
74+
75+
**Symptom in prod:** Seven MCP tools — `serviceDetail`, `blastRadius`, `findPath`, `findBottlenecks`, `findCircularDeps`, `findDeadServices`, `findNode` — all begin by calling `getCachedData()` (`McpTools.java` lines 83–92). `getCachedData()` calls `graphStore.findAll()`, which executes two full graph scans (`findAllNodes` + `findAllEdges`) and materialises every node and every edge into a `GraphData` record on the Java heap. On a 5M-node enriched graph this is multiple gigabytes per call. No result is cached between invocations — each call pays the full allocation cost. Two concurrent `blast_radius` invocations double-allocate. This is an OOM vector independent of the `run_cypher` issue in finding #2, and it is triggered by normal topology tool usage, not by adversarial queries.
76+
77+
**File / location:** `src/main/java/io/github/randomcodespace/iq/mcp/McpTools.java:83–92` (`getCachedData`); `src/main/java/io/github/randomcodespace/iq/graph/GraphStore.java` (`findAll`, `findAllNodes`, `findAllEdges`).
78+
79+
**Severity:** HIGH
80+
81+
**Fix proposal:** Replace `getCachedData()` with targeted Cypher queries per tool (e.g. `blastRadius` needs only the subgraph reachable from the seed node, not all 5M nodes). If a full snapshot is required for correctness, populate it once at serve startup into a size-bounded `SoftReference` and invalidate on graph reload. Add a Caffeine cache with a 60-second TTL and a max-weight bound in bytes. Effort: M.
82+
83+
---
84+
85+
### C2. MEDIUM — Swagger UI exposed unauthenticated; full API schema readable by any cluster workload
86+
87+
**Symptom in prod:** `pom.xml` includes `springdoc-openapi-starter-webmvc-ui:3.0.3`. SpringDoc auto-registers `/swagger-ui/index.html`, `/swagger-ui.html`, and `/v3/api-docs` at startup with no authentication guard. Because there is no Spring Security on the classpath (finding #1), no filter intercepts these paths. Any actor who can reach the pod gets the complete OpenAPI schema: every endpoint path, parameter name, response shape, and the full enumeration of `NodeKind` / `EdgeKind` values. This is reconnaissance-in-depth that lowers the cost of exploiting finding #1. Neither `springdoc.swagger-ui.enabled` nor `springdoc.api-docs.enabled` is set to `false` in the serving profile of `application.yml`.
88+
89+
**File / location:** `pom.xml` (`springdoc-openapi-starter-webmvc-ui:3.0.3`); `src/main/resources/application.yml` (no `springdoc.*` keys in serving profile).
90+
91+
**Severity:** MEDIUM
92+
93+
**Fix proposal:** In `application.yml` serving profile: `springdoc.swagger-ui.enabled: false` and `springdoc.api-docs.enabled: false`. Provide opt-in `codeiq.serving.swagger-ui.enabled: true` for local development. When auth (finding #1) is implemented, gate `/swagger-ui/**` and `/v3/api-docs/**` behind the same bearer check. Effort: XS.
94+
95+
---
96+
97+
### C3. MEDIUM — `McpTools.traceImpact` has no depth cap on the MCP path
98+
99+
**Symptom in prod:** As established in A2, `McpTools.traceImpact` forwards caller-supplied `depth` to `queryService.traceImpact` without any `Math.min` guard. A malicious or runaway MCP client sends `depth=1000` on a hub node; the resulting `RELATES_TO*1..1000` variable-length Cypher match runs until the transaction timeout fires — which is also not configured (finding #2). The REST endpoint at `GraphController:192` is safe; the MCP surface is not. `McpLimitsConfig` already defines a `maxDepth` field that is never consumed here.
100+
101+
**File / location:** `src/main/java/io/github/randomcodespace/iq/mcp/McpTools.java` (`traceImpact` method, depth forwarding).
102+
103+
**Severity:** MEDIUM
104+
105+
**Fix proposal:** `int safedDepth = depth != null ? Math.min(depth, mcpLimitsConfig.maxDepth()) : 3;` — wire the already-parsed `maxDepth` from `McpLimitsConfig`. Effort: XS.
106+
107+
---
108+
109+
### C4. MEDIUM — `semgrep` installed from PyPI without a pinned version in `security.yml`
110+
111+
**Symptom in prod:** `.github/workflows/security.yml` line 94 runs `python -m pip install --quiet --upgrade pip semgrep` with no version pin. Every workflow run fetches the latest `semgrep` release from PyPI at the moment of execution. Every other tool in the same workflow is pinned: `osv-scanner` uses `OSV_SCANNER_VERSION: 2.3.5` with a named release download; `gitleaks` uses `GITLEAKS_VERSION: 8.30.1`; all GitHub Actions are SHA-pinned. A compromised PyPI release of `semgrep` (or a transitive dependency) would execute arbitrary code inside the SAST job, which runs with `contents: read` permission and access to `GITHUB_TOKEN`. This directly contradicts the workflow's header comment: "All actions SHA-pinned per Scorecard `Pinned-Dependencies`."
112+
113+
**File / location:** `.github/workflows/security.yml:94`.
114+
115+
**Severity:** MEDIUM
116+
117+
**Fix proposal:** Pin to a specific version: `pip install semgrep==<current-stable>` (resolve via PyPI). Better: use `returntocorp/semgrep-action` pinned by commit SHA (free for OSS), which eliminates the PyPI install entirely and aligns with the workflow's existing SHA-pinning posture. Effort: XS.
118+
119+
---
120+
121+
### C5. LOW — CLAUDE.md tech-stack version table is stale on three components
122+
123+
**Symptom:** CLAUDE.md "Tech Stack" section states Spring Boot `4.0.5`, Spring AI `2.0.0-M3`, Neo4j Embedded `2026.02.3`. Actual values in `pom.xml`: Spring Boot `4.0.6`, Spring AI `2.0.0-M4`, Neo4j `2026.04.0`. Stale docs cause reviewers and automated tooling to reference wrong versions when checking CVE databases or compatibility matrices.
124+
125+
**File / location:** `CLAUDE.md` (Tech Stack section); `pom.xml` (ground truth).
126+
127+
**Severity:** LOW
128+
129+
**Fix proposal:** Update three lines in CLAUDE.md. Note that `pom.xml` is the SSoT; CLAUDE.md is informational. Effort: XS.
130+
131+
---
132+
133+
## Summary Table
134+
135+
| # | Original Finding | Verdict | Adjustment |
136+
|---|-----------------|---------|------------|
137+
| 1 | No auth on API/MCP | Confirmed ||
138+
| 2 | `run_cypher` no cap / timeout / READ mode | Confirmed ||
139+
| 3 | No rate limiting | Confirmed ||
140+
| 4 | Unsigned bundle + null checksums | Confirmed + extended | serve.sh/bat runtime Maven Central download is co-equal defect; fix must cover both |
141+
| 5 | `/api/file` ships secrets in bundle | Confirmed ||
142+
| 6 | Readiness fires before graph load | **Partially wrong** | Downgrade HIGH → MEDIUM; bootstrap-before-markReady ordering is correct; readiness group config gap is the real issue |
143+
| 7 | No `@RestControllerAdvice`; stack trace leak | Confirmed ||
144+
| 8 | MCP errors return HTTP 200 | Confirmed ||
145+
| 9 | No structured logs / request ID / MDC | Confirmed ||
146+
| 10 | `findShortestPath` + `traceImpact` unbounded | **Partially wrong** | REST `traceImpact` IS capped via `Math.min`; MCP `traceImpact` is NOT; fix target corrected |
147+
| 11 | `/api/file` no size cap; binary served as text | **Partially wrong** | Size cap IS enforced by SafeFileReader; binary content-type issue stands; remove size-cap claim |
148+
| 12 | `GraphHealthIndicator` uncached count on every probe | Confirmed ||
149+
| 13 | CORS defaults wrong; no CSP / security headers | Confirmed ||
150+
| 14 | Bad YAML silently uses defaults; no fail-fast | Confirmed ||
151+
| 15 | Zero integration tests for auth / rate-limit path | Confirmed ||
152+
| C1 | `getCachedData()` full graph load per topology call | **NET NEW** | HIGH |
153+
| C2 | Swagger UI unauthenticated | **NET NEW** | MEDIUM |
154+
| C3 | MCP `traceImpact` no depth cap | **NET NEW** | MEDIUM |
155+
| C4 | `semgrep` unpinned in `security.yml` | **NET NEW** | MEDIUM |
156+
| C5 | CLAUDE.md version table stale | **NET NEW** | LOW |

0 commit comments

Comments
 (0)