feat(server): phase 4 polish and observability (SYM-24) by Swiftyos · Pull Request #32 · Significant-Gravitas/AgentProbe

Swiftyos · 2026-04-17T17:40:30Z

Summary

Ships the Phase 4 hardening pass for agentprobe start-server: in-process metrics + spans + structured logger, SSE heartbeat/retry/terminal-event guarantees, dashboard keyboard navigation and empty/error/loading polish, plus deterministic latency-budget and soak harnesses with the supporting ops docs.
Every behaviour required by the SYM-24 acceptance criteria is backed by a new or updated test or a new repo entrypoint; no major refactors of the Phase 1–3 layout.

Key changes

Observability adapters under src/runtime/server/observability/ (metrics registry, span recorder, structured logger, redaction helpers, server.startup log). Wired into startAgentProbeServer (server.http.requests, server.runs.active, server.runs.started_total, server.runs.finished_total, server.sse.connections) and the RunController (run.started, run.finished, run.error plus span scopes for runStartValidation, runControllerExecute, and runSuiteBoot).
SSE hardening in src/runtime/server/routes/sse.ts and streams/events.ts: heartbeat comments, retry: 2000 on connect, Last-Event-ID honored from header and last_event_id query, explicit terminal kinds (run_finished / run_cancelled / run_failed), and terminal replay for historical runs whose ring buffer is gone.
Dashboard polish (dashboard/src/**): useKeyboardShortcuts dispatcher with j/k///g r/g p/g s, filter input on the Runs page with its own empty state, expanded empty/error/loading affordances on Suites/Scenarios, visible focus rings.
Latency budget + soak harness (scripts/latency-budget.ts, scripts/soak.ts): deterministic local checks that exit non-zero when p95 exceeds budgets; soak harness runs fast by default and flips to ~1h manual mode. Exposed as bun run latency-budget and bun run soak.
Docs: docs/RELIABILITY.md now lists shipped metric/span names and budgets; docs/playbooks/agent-probe-server.md adds proxy SSE + nginx buffering, backup/restore, migration recovery, dashboard cache notes, request-id troubleshooting, and the keyboard reference; platform product specs gain Phase 4 scenarios.

Test plan

Notes for reviewers

Targets Significant-Gravitas/AgentProbe:dev per the SYM-24 PR-target directive.
Metric and span names become part of the operational contract per the issue scope.
The metrics adapter is in-process only; no external collector is required or added.

🤖 Generated with Claude Code

## Intent Harden the AgentProbe server surfaces delivered in Phases 1–3 for long-running daily operation by adding in-process observability adapters, SSE reconnect guarantees, dashboard keyboard navigation and empty/error/loading polish, deterministic latency-budget and soak harnesses, and the supporting operational docs. ## Behavior changes - `src/runtime/server/observability/` now ships a narrow metrics registry (`server.http.requests`, `server.runs.active`, `server.runs.started_total`, `server.runs.finished_total`, `server.sse.connections`), a span recorder (`server.run.start.validation`, `server.run.controller.execute`, `server.run.suite.boot`), a structured logger, and redaction helpers for config/startup output. - `startAgentProbeServer` emits a single redacted `server.startup` log line, attaches per-request logs with `method`, `route`, `status`, `duration_ms`, and `request_id`, tags run-controller logs with `runId` and preset id, and exposes the observability handle on `StartedServer`. - SSE responses emit `retry:` directives, honor `Last-Event-ID` from both header and `last_event_id` query parameter, keep proxy-friendly headers (`no-store`, `no-transform`, `x-accel-buffering: no`, `keep-alive`), and guarantee exactly one terminal event (`run_finished`, `run_cancelled`, or `run_failed`) per run — including replays for historical runs whose ring buffer has been evicted. - Dashboard adds global keyboard shortcuts (`/`, `j`/`k`, `g r`, `g p`, `g s`), a filter input on the Runs page with its own empty state, and dedicated empty/error/loading affordances on the suites and filtered-runs surfaces. Shortcuts are suppressed in form inputs, textareas, selects, contenteditable targets, and when modifier keys are held. Focus rings are visible on list rows and the filter input. - New repo entrypoints: `bun run latency-budget` prints p50/p95/p99 for `GET /`, `GET /api/runs`, `POST /api/runs`, and SSE first-event on seeded local data; `bun run soak` runs a fast CI soak by default (`--manual` extends to ~1h) and emits the operator summary documented in `docs/RELIABILITY.md`. ## Documentation - `docs/RELIABILITY.md` lists the shipped metrics, spans, and latency budgets plus the SSE hardening contract and soak-harness modes. - `docs/playbooks/agent-probe-server.md` adds proxy SSE + nginx buffering guidance, backup/restore, migration recovery, dashboard cache behaviour, request-id troubleshooting, and the keyboard shortcut reference. - `docs/product-specs/platform.md`, `current-state.md`, and `e2e-checklist.md` gain Phase 4 acceptance scenarios and their coverage mapping. ## Validation - [x] `bun run docs:validate` - [x] `bun run typecheck` - [x] `bun run test` - [x] `bun run test:e2e` - [x] `bun run dashboard:build` - [x] `bun run fast-feedback` - [x] `bun run soak --duration-ms 5000 --runs 30 --sse-connections 2` (active runs at shutdown == 0, no failures) - [x] `bun run latency-budget --samples 5 --report-only` (all surfaces well under budget on loopback) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(server): phase 4 polish and observability (SYM-24)#32

feat(server): phase 4 polish and observability (SYM-24)#32
Swiftyos wants to merge 1 commit intodevfrom
symphony/SYM-24

Swiftyos commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Swiftyos commented Apr 17, 2026

Summary

Key changes

Test plan

Notes for reviewers

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant