Skip to content

Commit 6e95c5e

Browse files
aksOpsclaude
andcommitted
feat(deploy): AKS read-only deploy hardening (sub-project 2)
Enables `codeiq serve` inside an AKS pod with `securityContext.readOnlyRootFilesystem=true` and a writable `/tmp`, without source-code changes to the serve profile or Neo4j wiring. The deploy contract is solved at the deployment layer plus a JVM-flag-preset launch wrapper. Deploy shape: Build CI: index → enrich → bundle → upload to Nexus AKS pod: init-container: pull bundle.zip from Nexus → unzip into /tmp/codeiq-data main container: scripts/aks-launch.sh /tmp/codeiq-data → java [JVM flag preset] -jar code-iq.jar serve /tmp/codeiq-data Why init-container copy + flag preset over alternatives: - vs. Neo4j read-only mode + tmp redirects: embedded Neo4j 2026.04.0 still acquires a `store_lock` file at open; per-version fragility isn't worth fighting when /tmp is writable. - vs. baking the bundle into the container image: container's writable upper layer is also read-only when mounted --read-only, so Neo4j still fails. Plus large image, cadence coupling. - vs. swapping Neo4j for a static snapshot at serve time: throws away the entire read API surface (Cypher, indexes, full-text search). Reserved as the fallback if init-container copy proves operationally insufficient — out of scope here. JVM flag preset (encoded in scripts/aks-launch.sh): -Dorg.springframework.boot.loader.tmpDir=/tmp/spring-boot-loader Spring Boot fat JAR extracts nested JARs to ~/.m2/spring-boot-loader-tmp by default — outside /tmp, fails under read-only HOME. -Djava.io.tmpdir=/tmp Explicit even though /tmp is the Linux default — multipart upload temps, JNA / Netty native lib extraction all use this; making it explicit means base-image-default drift can't break us. -XX:ErrorFile=/tmp/hs_err_pid%p.log -XX:HeapDumpPath=/tmp -XX:+HeapDumpOnOutOfMemoryError JVM crash + heap-dump default is cwd. cwd under read-only root = unwritable. These redirect to /tmp so dumps survive for kubectl cp. Files in this commit: - docs/specs/2026-04-28-aks-read-only-deploy-design.md — architecture spec: problem, approach, audit table (Neo4j store_lock, spring-boot-loader, JVM crash files, logback, H2 cache, SPA static), test approach (sentinel + docker smoke), risks, acceptance criteria. Logback was verified console-only via src/main/resources/logback- spring.xml — no file appender redirect needed. - docs/plans/2026-04-28-sub-project-2-aks-read-only-deploy.md — task list (5 tasks, single PR), file map, acceptance gates, deliberate out-of-scope items. - shared/runbooks/aks-read-only-deploy.md — canonical operational runbook: deploy shape, full Kubernetes Pod manifest snippet (init container + main container with SecurityContext, volume mounts, probes, resource limits), reference Dockerfile, JVM flag preset table, three verification gates (local docker smoke, sentinel test, in-cluster smoke), rollback, troubleshooting matrix. - scripts/aks-launch.sh — the launch wrapper. set -euo pipefail, arg validation, JAR location resolution (/app/code-iq.jar default, $CODEIQ_JAR override), 1 GB /tmp pre-flight, mkdir /tmp/spring-boot-loader, exec java to PID 1. - src/test/java/.../deploy/AksLaunchScriptSentinelTest.java — 11 sentinel tests asserting every required flag, the strict-bash mode, arg-count validation, exec-to-pid-1 contract, and the /tmp pre-flight floor are present in scripts/aks-launch.sh. Catches drift on refactor. - CHANGELOG.md — [Unreleased] / Added entry. - shared/runbooks/engineering-standards.md §7.1.1 — new subsection cross-linking the runbook + script for downstream consumers running on hardened container runtimes. Tests: mvn test → 3427 / 0 failures / 31 skipped (full suite). The delta from a sub-project-1 branch run (3618) is the ~190 sub-project-1 tests that haven't merged yet — independent and expected. Out of scope for this PR (deliberate, listed in spec §3 + plan): - Heavyweight JVM-level filesystem-write detector (Java has no clean chroot/unshare API; environment-fragile in CI). The runbook docker smoke is the SSoT for "did this actually work in a RO root." - /api/diagnostics endpoint surfacing JVM flag preset values. - Static-snapshot storage layer rewrite (Approach D in the spec). - Helm / OCI artifact packaging — runbook ships vanilla Kubernetes manifest; productionizing into Helm is the deployer's call. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 54be162 commit 6e95c5e

7 files changed

Lines changed: 767 additions & 0 deletions

File tree

CHANGELOG.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,18 @@ for that specific tag for the per-commit details.
4141
summary of the Best Practices state, Scorecard baseline + target (≥ 8.0/10
4242
stretch with eight checks at max), known floor reductions, and the OSS-CLI
4343
stack reference. (RAN-52 AC #7)
44+
- **AKS read-only deploy hardening** (sub-project 2): runbook at
45+
[`shared/runbooks/aks-read-only-deploy.md`](shared/runbooks/aks-read-only-deploy.md),
46+
JVM-flag-preset launcher at [`scripts/aks-launch.sh`](scripts/aks-launch.sh),
47+
and a sentinel test asserting the script contains every required flag.
48+
Enables `codeiq serve` inside an AKS pod with
49+
`securityContext.readOnlyRootFilesystem=true` and a writable `/tmp`
50+
emptyDir: an init-container copies the graph bundle from Nexus into
51+
`/tmp/codeiq-data`; the main container runs `aks-launch.sh /tmp/codeiq-data`.
52+
Zero source-code changes to the serve profile or Neo4j wiring — solved at
53+
the deployment layer plus Spring-Boot-loader / `java.io.tmpdir` /
54+
`-XX:ErrorFile` / `-XX:HeapDumpPath` overrides. Spec at
55+
[`docs/specs/2026-04-28-aks-read-only-deploy-design.md`](docs/specs/2026-04-28-aks-read-only-deploy-design.md).
4456

4557
### Changed
4658

Lines changed: 156 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,156 @@
1+
# Sub-project 2 implementation plan — AKS read-only deploy hardening
2+
3+
> **Spec:** [`docs/specs/2026-04-28-aks-read-only-deploy-design.md`](../specs/2026-04-28-aks-read-only-deploy-design.md)
4+
>
5+
> **Goal:** ship a runbook + JVM-flag-preset launch script + a sentinel test, so `codeiq serve` runs cleanly inside an AKS pod with read-only root filesystem and writable `/tmp`. No source-code changes to the serve profile or Neo4j wiring.
6+
>
7+
> **Scope:** small. Five files changed, single PR off `main`. Independent of sub-project 1.
8+
9+
## File map
10+
11+
| Action | Path | Purpose |
12+
|---|---|---|
13+
| **CREATE** | `docs/specs/2026-04-28-aks-read-only-deploy-design.md` | Architecture spec (✅ done with this plan). |
14+
| **CREATE** | `docs/plans/2026-04-28-sub-project-2-aks-read-only-deploy.md` | This file. |
15+
| **CREATE** | `shared/runbooks/aks-read-only-deploy.md` | Canonical deploy runbook. |
16+
| **CREATE** | `scripts/aks-launch.sh` | JVM-flag-preset launch wrapper. |
17+
| **CREATE** | `src/test/java/io/github/randomcodespace/iq/deploy/AksLaunchScriptSentinelTest.java` | Asserts the launch script contains the required flags. Catches drift. |
18+
| **MODIFY** | `CHANGELOG.md` | New `[Unreleased] / Added` bullet. |
19+
| **MODIFY** | `shared/runbooks/engineering-standards.md` | §7.1 cross-link to the new runbook. |
20+
21+
## Tasks
22+
23+
### Task 1 — Runbook
24+
25+
**File:** `shared/runbooks/aks-read-only-deploy.md`.
26+
27+
**Sections:** Overview · Deploy shape · Init-container pattern (Kubernetes manifest snippet) · JVM flag preset · Local docker smoke · Rollback · Cross-references.
28+
29+
**Hard requirement:** every command in the runbook must be runnable as-is. No placeholder URLs. Where a Nexus URL is needed, parameterize via `$NEXUS_URL` env, document it once.
30+
31+
### Task 2 — Launch script
32+
33+
**File:** `scripts/aks-launch.sh`.
34+
35+
**Skeleton:**
36+
37+
```bash
38+
#!/usr/bin/env bash
39+
# AKS read-only deploy launcher for codeiq serve.
40+
# Usage: aks-launch.sh /tmp/codeiq-data
41+
set -euo pipefail
42+
43+
if [[ $# -ne 1 ]]; then
44+
echo "usage: $(basename "$0") <data-dir>" >&2
45+
exit 64
46+
fi
47+
DATA_DIR="$1"
48+
49+
# Resolve the codeiq JAR location. Container image installs it at /app.
50+
JAR="${CODEIQ_JAR:-/app/code-iq.jar}"
51+
52+
# Pre-flight: ensure /tmp has enough headroom (1 GB minimum).
53+
TMP_FREE_KB="$(df -Pk /tmp | awk 'NR==2 {print $4}')"
54+
if [[ "$TMP_FREE_KB" -lt 1048576 ]]; then
55+
echo "fatal: /tmp has < 1 GB free ($TMP_FREE_KB KB)" >&2
56+
exit 70
57+
fi
58+
59+
# JVM flag preset: every entry has a non-default behavior that without it
60+
# would write outside /tmp. Order is intentional — system properties first,
61+
# then -XX flags, so any -XX value referencing a system property resolves.
62+
JAVA_OPTS=(
63+
-Dorg.springframework.boot.loader.tmpDir=/tmp/spring-boot-loader
64+
-Djava.io.tmpdir=/tmp
65+
-XX:ErrorFile=/tmp/hs_err_pid%p.log
66+
-XX:HeapDumpPath=/tmp
67+
-XX:+HeapDumpOnOutOfMemoryError
68+
)
69+
70+
mkdir -p /tmp/spring-boot-loader
71+
72+
exec java "${JAVA_OPTS[@]}" -jar "$JAR" serve "$DATA_DIR"
73+
```
74+
75+
**Permissions:** `chmod +x scripts/aks-launch.sh` after create. Must be executable (the sentinel test asserts this).
76+
77+
### Task 3 — Sentinel test
78+
79+
**File:** `src/test/java/io/github/randomcodespace/iq/deploy/AksLaunchScriptSentinelTest.java`.
80+
81+
**Assertions** (one per required flag, plus structural checks):
82+
83+
```java
84+
@Test void scriptIsExecutable() { ... }
85+
@Test void scriptUsesStrictBashMode() { ... } // set -euo pipefail
86+
@Test void scriptValidatesArgCount() { ... }
87+
@Test void scriptSetsSpringBootLoaderTmpDir() { ... }
88+
@Test void scriptSetsJavaIoTmpdir() { ... }
89+
@Test void scriptSetsJvmErrorFile() { ... }
90+
@Test void scriptSetsHeapDumpPath() { ... }
91+
@Test void scriptEnablesHeapDumpOnOom() { ... }
92+
@Test void scriptExecsJava() { ... } // exec java to PID 1
93+
```
94+
95+
The test reads the script as a `String` and grep-matches each required substring. Cheap, deterministic, drift-proof.
96+
97+
### Task 4 — CHANGELOG entry
98+
99+
**File:** `CHANGELOG.md`.
100+
101+
**Add to `[Unreleased] / ### Added`:**
102+
103+
```markdown
104+
- AKS read-only deploy hardening (sub-project 2): runbook at
105+
`shared/runbooks/aks-read-only-deploy.md`, JVM-flag-preset launcher at
106+
`scripts/aks-launch.sh`, and a sentinel test asserting the script
107+
contains every required flag. Enables `codeiq serve` inside an AKS pod
108+
with read-only root filesystem + writable `/tmp` (init-container
109+
copies bundle from Nexus → `/tmp/codeiq-data`; main container runs
110+
`aks-launch.sh /tmp/codeiq-data`). Zero source-code changes to the
111+
serve profile or Neo4j wiring — solved at the deployment layer plus
112+
Spring-Boot-loader / JVM crash-file path overrides. Spec at
113+
`docs/specs/2026-04-28-aks-read-only-deploy-design.md`.
114+
```
115+
116+
### Task 5 — engineering-standards cross-link
117+
118+
**File:** `shared/runbooks/engineering-standards.md` §7.1.
119+
120+
Add a one-line bullet right under the existing "deploy surface" sentence:
121+
122+
```markdown
123+
- AKS read-only deploy is supported via `shared/runbooks/aks-read-only-deploy.md`
124+
and `scripts/aks-launch.sh` (sub-project 2). The Maven Central artifact + the
125+
launch script + an init-container that copies the graph bundle from Nexus
126+
into `/tmp/codeiq-data` is the full surface — no separate hosted backend.
127+
```
128+
129+
### Task 6 — Test loop + commit
130+
131+
```bash
132+
mvn test -Dtest=AksLaunchScriptSentinelTest
133+
mvn test # full suite — confirm nothing else regressed
134+
git add docs/specs/ docs/plans/ shared/runbooks/ scripts/aks-launch.sh \
135+
src/test/java/io/github/randomcodespace/iq/deploy/ CHANGELOG.md
136+
git commit -m "feat(deploy): AKS read-only deploy hardening (sub-project 2)"
137+
git push -u origin feat/sub-project-2-aks-read-only-deploy
138+
gh pr create --base main \
139+
--title "feat: AKS read-only deploy hardening (sub-project 2)" \
140+
--body "..."
141+
```
142+
143+
## Acceptance gates
144+
145+
- [ ] All seven files in the file map exist and are non-empty.
146+
- [ ] Sentinel test green.
147+
- [ ] Full `mvn test` green.
148+
- [ ] Runbook commands are copy-pasteable; no placeholder URLs that the operator can't substitute.
149+
- [ ] PR open against `main`.
150+
151+
## Out of scope (deliberate)
152+
153+
- A heavyweight JVM-level filesystem-write detector (Java has no clean `chroot` / `unshare` API; environment-fragile in CI). The runbook docker smoke is the SSoT for "did this actually work in a RO root."
154+
- A `/api/diagnostics` endpoint surfacing JVM flag preset values. Tracked separately if ops need it.
155+
- Switching the storage layer to a static snapshot (Approach D in the spec). Reserved as the fallback if init-container copy proves operationally insufficient.
156+
- Helm chart / OCI artifact packaging. The runbook ships a vanilla Kubernetes manifest snippet; productionizing into Helm is the deployer's call.

0 commit comments

Comments
 (0)