A Rust runtime that routes coding-agent tasks to the cheapest model that can pass a compile gate, enforces hexagonal boundaries at commit time, and reconciles agent work against git evidence.
hex is a local-first runtime for AI coding agents. It sits between an agent (Claude Code, or a local Ollama model in standalone mode) and your codebase, and does three concrete things:
-
Classifies every task into a tier and routes it to a model of matching size. A prompt's
strategy_hint(scaffold/codegen/inference) and structural heuristics pick T1 (4B), T2 (32B), T2.5 (24B), or T3 (frontier). Seehex-cli/src/commands/hook.rs::classify_work_intent. -
Wraps model output in a compile gate. For T1/T2/T2.5 it generates best-of-N completions and only accepts a candidate that passes
cargo checkortsc --noEmit. Compiler errors from failed candidates are fed back into the next attempt. Seehex-nexus/src/remote/transport.rs. -
Validates hexagonal architecture on every commit.
hex analyzeparses TypeScript and Rust with tree-sitter, classifies each file into a layer (domain/ports/adapters/usecases), and fails if a cross-layer import violates the dependency direction. Seehex-cli/src/commands/analyze.rs.
Work is tracked as a workplan JSON (phases, tasks, adapter boundaries, gates). Completion is not self-reported — the reconciler walks git log and requires non-empty evidence.commits[] before a task is considered done (hex plan reconcile).
AI agents write code that compiles locally and fails at integration. They violate layering boundaries that aren't enforced by the build. They report "done" on work they didn't do. And cloud inference pricing doesn't scale when you run many agents.
hex addresses these with mechanical checks, not promises:
| Failure mode | hex response |
|---|---|
| Agent writes non-compiling code | Best-of-N + compile gate (cannot be accepted without passing cargo check/tsc --noEmit) |
| Agent violates layer boundaries | Tree-sitter import scan blocks commits with cross-adapter imports |
| Agent self-reports false completion | Reconciler requires git commits touching the task's files |
| Frontier API cost per task | Tier classifier routes boilerplate to local 4B/32B models; escalates only on failure |
| Two agents edit the same file | HexFlo worktree-per-adapter + CAS task claims |
hex is designed to drop into existing projects with zero breaking changes:
# 1. Add hex-core as a dependency
cargo add hex-core
# 2. Bootstrap the runtime (one command)
hex bootstrap --profile dev
# 3. Start using hex commands
hex analyze . # Check architecture boundaries
hex plan draft "add auth" # Create workplan stub
hex plan execute <plan> # Run autonomous feature workNo configuration required. hex reads your workspace structure and starts enforcing rules immediately. The bootstrap command handles all infrastructure (SpacetimeDB, Ollama models, GPU setup).
Tested on: macOS (Intel/ARM), Linux (x86_64, GPU), Docker. Setup time: ~2 minutes start-to-ready (vs. 45 minutes manual setup).
docker run -d --name hex \
-p 5555:5555 -p 3033:3033 \
-v $(pwd):/workspace \
ghcr.io/gaberger/hex-nexus:latestcurl -L https://github.com/gaberger/hex/releases/latest/download/hex-darwin-arm64 -o /usr/local/bin/hex
chmod +x /usr/local/bin/hex
hex # status + next-step suggestionsDashboard: http://localhost:5555.
# Automated setup for local development (handles everything):
hex bootstrap --profile dev
# What it does:
# • Starts SpacetimeDB (coordination layer)
# • Starts Ollama with GPU support (if available)
# • Loads all 3 inference models (T1, T2, T2.5)
# • Creates .hex/project.json with tier configuration
# • Validates GPU acceleration if present
# • Reports diagnostic status
# Takes ~2 minutes. No manual steps. No build tools needed.Before bootstrap, hex required 45 minutes of manual setup (downloading models, configuring ports, managing processes). Now it's one command. See Bootstrap Guide for details.
user prompt ──► classify_work_intent ──► tier
│
T1: answered in-session (TodoWrite)
T2: one-line suggestion in hook output
T3: auto-drafts docs/workplans/drafts/draft-*.json
│
/hex-feature-dev
│
▼
behavioral-spec-writer ──► docs/specs/<feature>.json
│
planner ──► docs/workplans/feat-<feature>.json
│
hex plan execute <workplan>
│
HexFlo swarm dispatches tasks per adapter
│
worktree: feat/<feature>/<layer>
│
best-of-N ──► cargo check / tsc --noEmit (blocking gate)
│
validation-judge ──► PASS / FAIL (blocking)
│
integrator merges worktrees in dependency order
│
hex plan reconcile ──► verify evidence.commits[]
Each artifact is auditable: the spec is a JSON file, the workplan is a JSON file, every task carries the commits it produced, and every merge runs the analyzer.
# project status
hex # next-step suggestions
hex status # overview
hex analyze . # architecture violations + dead code
# workplans
hex plan draft <prompt> # create a workplan stub (auto-invoked on T3 prompts)
hex plan execute <wp.json> # dispatch to HexFlo
hex plan reconcile --update # sync task status with git evidence
# daemon (autonomous tick loop)
hex sched daemon --background --interval 30
hex sched enqueue workplan <wp.json>
hex sched queue list
# swarm
hex swarm init <name>
hex task listNatural-language dispatch (hex hey "rebuild nexus and validate") routes through the classifier; explicit commands are equivalent.
Problem: Workplan tasks could hang indefinitely during inference, blocking autonomous execution. Processes would accumulate at 0% CPU with no feedback, making diagnosis impossible.
Solution: Implemented tier-specific timeout guards + heartbeat mechanism (P2-P3 from ADR-2604180001):
| Tier | Timeout | Use Case |
|---|---|---|
| T1 | 30s | Scaffold/transform (qwen3:4b) |
| T2 | 120s | Codegen (qwen2.5-coder:32b) |
| T2.5 | 300s | Complex reasoning (devstral-small-2:24b) |
| T3 | 600s | Frontier tasks (Claude) |
Proof of Fix (2026-04-17 Testing):
E2E Validation on Bazzite GPU — Task Execution Times:
P1-1: ✅ 60s (first attempt) → 44s (retry) — NO HANG
P1-2: ✅ 35s (retry) — NO HANG
P1-3: Started execution (file path issue unrelated to timeouts)
Before fix: Tasks would hang for hours at 0% CPU
After fix: Tasks complete within tier timeout or fail with clear error
Implementation Details:
hex-nexus/src/orchestration/workplan_executor.rs: Task-level timeout calculation based on inferred tier- Heartbeat logging every 30s during long-running inference
- Error reasons captured and reported (not silent failures)
- Proper state sync to prevent zombie processes
Verification:
# Review timeout configuration
grep -A 10 "timeout_secs = match task_tier" hex-nexus/src/orchestration/workplan_executor.rs
# Check heartbeat logging
hex plan execute <workplan> 2>&1 | grep "heartbeat\|timeout"This fix enables autonomous workplan execution without indefinite hangs.
hex-cli/ CLI binary, MCP server, tier classifier
hex-nexus/ Daemon (REST API, dashboard, filesystem bridge, inference adapters)
hex-core/ Port traits + domain types (zero external deps)
hex-agent/ Agent runtime (skills, hooks, boundary enforcement)
hex-parser/ Tree-sitter wrappers
spacetime-modules/ 7 WASM modules (coordination state, only in AIOS-linked mode)
docs/adrs/ 182 Architecture Decision Records
docs/specs/ Behavioral specs (written before code)
docs/workplans/ Active and archived workplans
hex runs in two modes:
- Claude-integrated:
CLAUDE_SESSION_IDset. Dispatches through Claude Code. - Standalone:
CLAUDE_SESSION_IDunset. Dispatches through an Ollama adapter (ADR-2604112000). The same workplan executes either way. Runhex doctor compositionto see which is active.
Alpha. 182 ADRs document the design trail; core paths (classifier, compile gate, reconciler, HexFlo, tree-sitter analyzer) are in place and exercised by tests and examples. Every mechanical claim above has a reproducer — see EVIDENCE.md for the exact command per claim, prerequisites, and expected output. Benchmark numbers in INFERENCE.md were measured on a single Strix Halo + Vulkan-Ollama box and will differ on other hardware; the evidence page includes a script you can run to get numbers for your own environment.
Formal specs of the coordination, scheduling, and feature-pipeline state machines live in docs/algebra/ (TLA+, model-checked with TLC).
| Doc | Contents |
|---|---|
| Evidence | Reproducer for every claim in this README — commands, tests, expected output |
| Architecture | Crates, layers, analyzer rules, SpacetimeDB modules |
| Getting Started | Install, standalone mode, remote agents |
| Inference | Tier routing, GBNF grammar constraints, RL model selection |
| Comparison | hex vs. SpecKit, BAML, Claude Agent SDK, LangChain |
| Developer Experience | Pulse / Brief / Console / Override layers |
| Formal Verification | TLA+ models and TLC workflow |
| ADRs | 182 decision records — the why behind each mechanism |
hex builds on hexagonal architecture (Alistair Cockburn, 2005), tree-sitter (Max Brunsfeld et al.), and SpacetimeDB. HexFlo was informed by claude-flow (Reuven Cohen).
| Contributor | Role |
|---|---|
| Gary (@gaberger) | Creator, architect |
| Claude (Anthropic) | Pair programmer |