Skip to content

feat(server): phase 4 polish and observability (SYM-24)#32

Open
Swiftyos wants to merge 1 commit intodevfrom
symphony/SYM-24
Open

feat(server): phase 4 polish and observability (SYM-24)#32
Swiftyos wants to merge 1 commit intodevfrom
symphony/SYM-24

Conversation

@Swiftyos
Copy link
Copy Markdown

Summary

  • Ships the Phase 4 hardening pass for agentprobe start-server: in-process metrics + spans + structured logger, SSE heartbeat/retry/terminal-event guarantees, dashboard keyboard navigation and empty/error/loading polish, plus deterministic latency-budget and soak harnesses with the supporting ops docs.
  • Every behaviour required by the SYM-24 acceptance criteria is backed by a new or updated test or a new repo entrypoint; no major refactors of the Phase 1–3 layout.

Key changes

  • Observability adapters under src/runtime/server/observability/ (metrics registry, span recorder, structured logger, redaction helpers, server.startup log). Wired into startAgentProbeServer (server.http.requests, server.runs.active, server.runs.started_total, server.runs.finished_total, server.sse.connections) and the RunController (run.started, run.finished, run.error plus span scopes for runStartValidation, runControllerExecute, and runSuiteBoot).
  • SSE hardening in src/runtime/server/routes/sse.ts and streams/events.ts: heartbeat comments, retry: 2000 on connect, Last-Event-ID honored from header and last_event_id query, explicit terminal kinds (run_finished / run_cancelled / run_failed), and terminal replay for historical runs whose ring buffer is gone.
  • Dashboard polish (dashboard/src/**): useKeyboardShortcuts dispatcher with j/k///g r/g p/g s, filter input on the Runs page with its own empty state, expanded empty/error/loading affordances on Suites/Scenarios, visible focus rings.
  • Latency budget + soak harness (scripts/latency-budget.ts, scripts/soak.ts): deterministic local checks that exit non-zero when p95 exceeds budgets; soak harness runs fast by default and flips to ~1h manual mode. Exposed as bun run latency-budget and bun run soak.
  • Docs: docs/RELIABILITY.md now lists shipped metric/span names and budgets; docs/playbooks/agent-probe-server.md adds proxy SSE + nginx buffering, backup/restore, migration recovery, dashboard cache notes, request-id troubleshooting, and the keyboard reference; platform product specs gain Phase 4 scenarios.

Test plan

  • bun run docs:validate
  • bun run typecheck
  • bun run test (173 passing, 36 files)
  • bun run test:e2e
  • bun run dashboard:build
  • bun run fast-feedback
  • bun run soak --duration-ms 5000 --runs 30 --sse-connections 2 — 0 active runs at shutdown, 0 failures
  • bun run latency-budget --samples 5 --report-only — all surfaces well under documented budgets on loopback
  • Manual ~1h soak (bun run soak --manual) on the reviewer's machine if the CI soak is not enough evidence
  • Manual dashboard pass at desktop + narrow mobile widths for Runs (search + empty-filter), Start, Presets, Compare, Settings (auth state), empty states, and SSE-driven live run detail

Notes for reviewers

  • Targets Significant-Gravitas/AgentProbe:dev per the SYM-24 PR-target directive.
  • Metric and span names become part of the operational contract per the issue scope.
  • The metrics adapter is in-process only; no external collector is required or added.

🤖 Generated with Claude Code

## Intent

Harden the AgentProbe server surfaces delivered in Phases 1–3 for long-running daily operation by adding in-process observability adapters, SSE reconnect guarantees, dashboard keyboard navigation and empty/error/loading polish, deterministic latency-budget and soak harnesses, and the supporting operational docs.

## Behavior changes

- `src/runtime/server/observability/` now ships a narrow metrics registry (`server.http.requests`, `server.runs.active`, `server.runs.started_total`, `server.runs.finished_total`, `server.sse.connections`), a span recorder (`server.run.start.validation`, `server.run.controller.execute`, `server.run.suite.boot`), a structured logger, and redaction helpers for config/startup output.
- `startAgentProbeServer` emits a single redacted `server.startup` log line, attaches per-request logs with `method`, `route`, `status`, `duration_ms`, and `request_id`, tags run-controller logs with `runId` and preset id, and exposes the observability handle on `StartedServer`.
- SSE responses emit `retry:` directives, honor `Last-Event-ID` from both header and `last_event_id` query parameter, keep proxy-friendly headers (`no-store`, `no-transform`, `x-accel-buffering: no`, `keep-alive`), and guarantee exactly one terminal event (`run_finished`, `run_cancelled`, or `run_failed`) per run — including replays for historical runs whose ring buffer has been evicted.
- Dashboard adds global keyboard shortcuts (`/`, `j`/`k`, `g r`, `g p`, `g s`), a filter input on the Runs page with its own empty state, and dedicated empty/error/loading affordances on the suites and filtered-runs surfaces. Shortcuts are suppressed in form inputs, textareas, selects, contenteditable targets, and when modifier keys are held. Focus rings are visible on list rows and the filter input.
- New repo entrypoints: `bun run latency-budget` prints p50/p95/p99 for `GET /`, `GET /api/runs`, `POST /api/runs`, and SSE first-event on seeded local data; `bun run soak` runs a fast CI soak by default (`--manual` extends to ~1h) and emits the operator summary documented in `docs/RELIABILITY.md`.

## Documentation

- `docs/RELIABILITY.md` lists the shipped metrics, spans, and latency budgets plus the SSE hardening contract and soak-harness modes.
- `docs/playbooks/agent-probe-server.md` adds proxy SSE + nginx buffering guidance, backup/restore, migration recovery, dashboard cache behaviour, request-id troubleshooting, and the keyboard shortcut reference.
- `docs/product-specs/platform.md`, `current-state.md`, and `e2e-checklist.md` gain Phase 4 acceptance scenarios and their coverage mapping.

## Validation

- [x] `bun run docs:validate`
- [x] `bun run typecheck`
- [x] `bun run test`
- [x] `bun run test:e2e`
- [x] `bun run dashboard:build`
- [x] `bun run fast-feedback`
- [x] `bun run soak --duration-ms 5000 --runs 30 --sse-connections 2` (active runs at shutdown == 0, no failures)
- [x] `bun run latency-budget --samples 5 --report-only` (all surfaces well under budget on loopback)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant