AgentProbe is an agent-first repository for a Bun + TypeScript CLI that runs repeatable evaluations against HTTP, WebSocket, and local harness endpoints. The implementation, tests, and repo workflows are Bun-first and TypeScript-only.
- Humans steer by describing intent, reviewing outcomes, and tightening rules.
- Agents execute against repository-local context, not tribal knowledge.
AGENTS.mdis the router, not the manual.docs/is the source of truth for architecture, plans, product specs, reliability rules, and quality standards.- Mechanical checks keep the knowledge base fresh enough for future agent runs.
Use Bun entrypoints for the repository workflow:
# Validate docs, links, indexes, and generated artifacts
bun run docs:validate
# Refresh generated documentation surfaces
bun run docs:index
bun run docs:workspace
bun run docs:quality
# Run the Bun-owned tests
bun run test
bun run test:e2e
# Run the repo-wide PR loop
bun run fast-feedback
# Run the local CI-equivalent gate
bun run cidocs/README.md: best starting point for the knowledge basedocs/ARCHITECTURE.md: layered domain architecture and dependency rulesdocs/DESIGN.md: agent-first principles and repository legibility standardsdocs/PRODUCT_SENSE.md: product goals, CLI UX priorities, and scopedocs/RELIABILITY.md: logging, metrics, traces, and latency budgetsdocs/SECURITY.md: boundary validation, secret handling, and SDK rulesdocs/product-specs/platform.md: canonical product behavior contractdocs/PLANS.md: how active plans, completed plans, and debt are managed
- Bun + TypeScript is the canonical runtime contract.
- The target quality gate stack is strict
tsc --noEmit, Biome, repo-specific structural checks, and deterministic test evidence. - External systems must be accessed through typed SDK/provider interfaces.
- Boundary parsing is required for YAML, config, network payloads, and SDK
responses. No unchecked
any, no guessed shapes. - UTF-8 is the default for committed text files and explicit
"utf8"is required in Bun/Node text I/O. - Structured logs, metrics, and spans are required on critical paths so agents can debug failures and enforce response-time budgets.
-
Fast validation loops for suite authors and agent developers
-
Repeatable evaluation runs with inspectable artifacts
-
Clear run/report output that supports automated diagnosis
-
Strong separation between CLI orchestration, domain logic, persistence, and external endpoint integrations
-
data/: sample endpoint, scenario, persona, and rubric YAML -
tests/: test suite -
agentprobe-spec-v0.2.md: working spec/reference document