feat(server): phase 4 polish and observability (SYM-24)#32
Open
feat(server): phase 4 polish and observability (SYM-24)#32
Conversation
## Intent Harden the AgentProbe server surfaces delivered in Phases 1–3 for long-running daily operation by adding in-process observability adapters, SSE reconnect guarantees, dashboard keyboard navigation and empty/error/loading polish, deterministic latency-budget and soak harnesses, and the supporting operational docs. ## Behavior changes - `src/runtime/server/observability/` now ships a narrow metrics registry (`server.http.requests`, `server.runs.active`, `server.runs.started_total`, `server.runs.finished_total`, `server.sse.connections`), a span recorder (`server.run.start.validation`, `server.run.controller.execute`, `server.run.suite.boot`), a structured logger, and redaction helpers for config/startup output. - `startAgentProbeServer` emits a single redacted `server.startup` log line, attaches per-request logs with `method`, `route`, `status`, `duration_ms`, and `request_id`, tags run-controller logs with `runId` and preset id, and exposes the observability handle on `StartedServer`. - SSE responses emit `retry:` directives, honor `Last-Event-ID` from both header and `last_event_id` query parameter, keep proxy-friendly headers (`no-store`, `no-transform`, `x-accel-buffering: no`, `keep-alive`), and guarantee exactly one terminal event (`run_finished`, `run_cancelled`, or `run_failed`) per run — including replays for historical runs whose ring buffer has been evicted. - Dashboard adds global keyboard shortcuts (`/`, `j`/`k`, `g r`, `g p`, `g s`), a filter input on the Runs page with its own empty state, and dedicated empty/error/loading affordances on the suites and filtered-runs surfaces. Shortcuts are suppressed in form inputs, textareas, selects, contenteditable targets, and when modifier keys are held. Focus rings are visible on list rows and the filter input. - New repo entrypoints: `bun run latency-budget` prints p50/p95/p99 for `GET /`, `GET /api/runs`, `POST /api/runs`, and SSE first-event on seeded local data; `bun run soak` runs a fast CI soak by default (`--manual` extends to ~1h) and emits the operator summary documented in `docs/RELIABILITY.md`. ## Documentation - `docs/RELIABILITY.md` lists the shipped metrics, spans, and latency budgets plus the SSE hardening contract and soak-harness modes. - `docs/playbooks/agent-probe-server.md` adds proxy SSE + nginx buffering guidance, backup/restore, migration recovery, dashboard cache behaviour, request-id troubleshooting, and the keyboard shortcut reference. - `docs/product-specs/platform.md`, `current-state.md`, and `e2e-checklist.md` gain Phase 4 acceptance scenarios and their coverage mapping. ## Validation - [x] `bun run docs:validate` - [x] `bun run typecheck` - [x] `bun run test` - [x] `bun run test:e2e` - [x] `bun run dashboard:build` - [x] `bun run fast-feedback` - [x] `bun run soak --duration-ms 5000 --runs 30 --sse-connections 2` (active runs at shutdown == 0, no failures) - [x] `bun run latency-budget --samples 5 --report-only` (all surfaces well under budget on loopback) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
agentprobe start-server: in-process metrics + spans + structured logger, SSE heartbeat/retry/terminal-event guarantees, dashboard keyboard navigation and empty/error/loading polish, plus deterministic latency-budget and soak harnesses with the supporting ops docs.Key changes
src/runtime/server/observability/(metrics registry, span recorder, structured logger, redaction helpers,server.startuplog). Wired intostartAgentProbeServer(server.http.requests,server.runs.active,server.runs.started_total,server.runs.finished_total,server.sse.connections) and theRunController(run.started,run.finished,run.errorplus span scopes forrunStartValidation,runControllerExecute, andrunSuiteBoot).src/runtime/server/routes/sse.tsandstreams/events.ts: heartbeat comments,retry: 2000on connect,Last-Event-IDhonored from header andlast_event_idquery, explicit terminal kinds (run_finished/run_cancelled/run_failed), and terminal replay for historical runs whose ring buffer is gone.dashboard/src/**):useKeyboardShortcutsdispatcher withj/k///g r/g p/g s, filter input on the Runs page with its own empty state, expanded empty/error/loading affordances on Suites/Scenarios, visible focus rings.scripts/latency-budget.ts,scripts/soak.ts): deterministic local checks that exit non-zero when p95 exceeds budgets; soak harness runs fast by default and flips to ~1h manual mode. Exposed asbun run latency-budgetandbun run soak.docs/RELIABILITY.mdnow lists shipped metric/span names and budgets;docs/playbooks/agent-probe-server.mdadds proxy SSE + nginx buffering, backup/restore, migration recovery, dashboard cache notes, request-id troubleshooting, and the keyboard reference; platform product specs gain Phase 4 scenarios.Test plan
bun run docs:validatebun run typecheckbun run test(173 passing, 36 files)bun run test:e2ebun run dashboard:buildbun run fast-feedbackbun run soak --duration-ms 5000 --runs 30 --sse-connections 2— 0 active runs at shutdown, 0 failuresbun run latency-budget --samples 5 --report-only— all surfaces well under documented budgets on loopbackbun run soak --manual) on the reviewer's machine if the CI soak is not enough evidenceNotes for reviewers
Significant-Gravitas/AgentProbe:devper the SYM-24 PR-target directive.🤖 Generated with Claude Code