feat: Add llms.txt catalog and execute backstop with hard read/write/delete gating#80
Conversation
…delete gating
Closes AIF-359 and AIF-377 in one progressive change.
Phase 1 — catalog resource + grep tool:
- octopus://api/llms.txt resource (5-minute in-memory TTL cache, shared with
the grep tool so a search doesn't trigger a re-fetch).
- octopus://api/capabilities resource composing server version, enabled
toolsets, available tools, and feature flags.
- grep_llms_txt tool with the GNU-grep parameter shape already used by
grep_task_log. SERVER_INSTRUCTIONS steers agents to grep the catalog
rather than reading the ~360 KB body.
- grepLines pure function lifted into src/helpers/grepLines.ts; both grep
tools share one implementation. Tests migrated alongside.
Phase 2 — execute backstop with hard read/write/delete classification:
- HTTP method enum is the authoritative classifier; classifyMethod in
src/helpers/methodTier.ts maps GET → read, POST/PUT/PATCH → write,
DELETE → delete. Three-tier semantics, not two.
- New --allow-deletes CLI flag plus ToolsetConfig.allowDeletes; DELETE
requires both --no-read-only and --allow-deletes (deliberate two-flag
opt-in for irreversible operations).
- Six gates in execute.ts, in order: sensitive denylist (always-on),
tier-based mode gate, path allowlist by enabled toolset, elicitation
on every non-GET (stronger 'IRREVERSIBLE' message for DELETE),
dispatch, audit-on-stderr.
- Sensitive denylist covers API-key endpoints (any method) and
catastrophic deletes (DELETE on /api/users/{id} and /api/spaces/{id}),
enforced even with both flags set.
- Path allowlist is a static toolset → glob map. Conservative initial
coverage; expand based on real usage. Glob engine in pathGlob.ts is
shared between allowlist and denylist; only `*` and `**` wildcards,
every other character escaped so denylist patterns can't be turned
into regex injection.
60+ new tests added; build, lint, and the full unit suite pass. The 11
pre-existing integration test failures (live testoctopus space lookup) are
unchanged — not caused by this work.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…s flag - Add an "API Catalog & Backstop" subsection covering grep_llms_txt, the execute tool's tier gating, and the catalog resources. - Update Read-Only Mode with the three-tier read/write/delete classification and the new --allow-deletes opt-in for DELETE requests. - Add --no-read-only and --allow-deletes to "Other command line arguments". - Update Security Considerations to spell out method-driven gating and the sensitive denylist. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
akirayamamoto
left a comment
There was a problem hiding this comment.
PR Review Notes
Nice direction overall. The split between read, write, and delete modes makes
sense, and I like that DELETE has a separate opt-in rather than being bundled
into general write access.
A couple of UX suggestions:
Consider renaming --no-read-only
--no-read-only works, but it asks users to reason through a double negative.
Since the flag enables write operations across the MCP server, I think
--allow-writes would be easier to understand at a glance.
You could keep --no-read-only as an alias for compatibility, but document
--allow-writes as the preferred flag:
--allow-writes
--allow-deletesThat reads more naturally: writes are off by default, and deletes require an
additional opt-in.
Make --allow-deletes clearer when read-only mode is still active
Since --allow-deletes only takes effect when write mode is also enabled, I’d
consider either rejecting this combination at startup:
--allow-deleteswithout:
--no-read-onlyor printing a clear startup warning.
Otherwise a user may think they have enabled deletes when read-only mode still
blocks them. That is safe behavior, but it could be confusing during setup or
debugging.
Something like this would help:
--allow-deleteswas provided, but read-only mode is still enabled. DELETE
requests remain blocked unless write mode is enabled.
Not a blocker from my side, but I think these tweaks would make the safety model
easier to understand.
Additional Codex Findings
Codex found a few issues worth looking at before merge:
1. Path checks should run on a canonical path
execute appears to run the allowlist and denylist checks against the raw path
argument, then passes that path to the API client. Codex found that paths with
dot segments, such as:
/api/spaces/Spaces-1/../../users/me/apikeys
can pass the /api/spaces/** allowlist check while resolving to a different API
path. I’d suggest normalizing or rejecting paths with .., encoded slashes,
backslashes, query strings, or fragments before any safety checks run.
2. /api/spaces/** makes the toolset allowlist too broad
Codex also found that core includes /api/spaces/**. Since core is always
enabled and checked first, this appears to allow all space-scoped endpoints
through execute, even when the owning toolset is disabled.
That makes the documented “toolset filtering as a kill switch” model weaker than
it looks. I’d suggest narrowing core to space discovery paths only, then adding
explicit /api/spaces/{space}/... patterns to each owning toolset.
3. octopus://api/capabilities may describe execute as read-only
Codex found that execute is registered as readOnly: true so it remains
available in read-only mode for GET requests. That makes sense internally, but
the capabilities resource may lead agents to think execute is fully read-only,
even when write mode allows POST/PUT/PATCH.
It may be clearer to expose execute as method-gated, or include effective
tiers such as read, write, and delete.
4. Minor docs issue: audit toolset
Codex noticed a couple of references to an audit toolset in comments/docs, but
that does not seem to be a current toolset. This is minor, but it could confuse
users reading the safety model.
…te capabilities Addresses the Codex findings and the operator-warning UX nit from #80 review. Path canonicalization (Codex #1): - New validateExecutePath helper rejects '..' segments, backslashes, query strings, fragments, double slashes, and percent-encoded slashes (%2F/%5C) before any allowlist/denylist runs. Without this, a path like /api/spaces/Spaces-1/../../users/me/apikeys could pass the legacy /api/spaces/** core allowlist while resolving to a different endpoint server-side. - Wired in as Gate 0 in execute.ts (runs before sensitive denylist) and surfaces a structured 'invalidPath' reason on rejection. Narrower core toolset (Codex #2): - Dropped /api/spaces/** from core. core now covers /api/spaces (list) and /api/spaces/* (single-space metadata) only — every per-resource path beneath a space must register against its owning toolset. - Each toolset now declares BOTH /api/<single>/X and /api/spaces/<single>/X prefix forms. Introduced a small spaceScoped() helper to avoid hand-typing four patterns per resource family. - This restores the "toolset filtering as kill switch" promise: disabling 'certificates' now actually makes /api/spaces/Spaces-1/certificates unreachable. Accurate execute capabilities (Codex #3): - The CapabilityToolEntry for execute now carries methodGated:true and a tiersAvailable array reflecting the current session — ['read'] in read-only mode, ['read','write'] with --no-read-only, and the full ['read','write','delete'] with both flags. Static read-only tools keep the existing readOnly:true and omit the new fields. Stray 'audit' references (Codex #4): - Removed the non-existent 'audit' toolset from the pathAllowlist JSDoc, the execute tool description, and the README. The example now references 'certificates', a real toolset. Operator warning when --allow-deletes is inert (review UX #2): - src/index.ts emits a clear stderr warning at startup if --allow-deletes is passed without --no-read-only, so an operator doesn't think they've opened up DELETE while the read-only gate still blocks everything. Tests: 13 new tests (validateExecutePath unit, execute Gate 0 coverage, core-narrowing regression, methodGated/tiersAvailable behaviour). 150 unit tests pass; build and lint clean. Not addressed (intentionally — out of scope, called out as non-blocking in the review): rename --no-read-only -> --allow-writes alias. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The /api/experimental/llms.txt endpoint shipped in 2026.2.3916. Set the minimum on grep_llms_txt's tool registration so --list-tools-by-version reports it correctly, and note the requirement on the catalog resource description and in the README so older-Octopus users see the version constraint before they hit a 404 from the upstream API. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Closes AIF-359 and AIF-377. Two phases shipped together as one progressive change.
Phase 1 — catalog resource + grep tool
octopus://api/llms.txtresource with a 5-min in-memory TTL cache, shared with the grep tool so a search doesn't trigger a re-fetch.octopus://api/capabilitiesresource: server version, enabled toolsets, available tools, feature flags.grep_llms_txttool — same GNU-grep parameter shape asgrep_task_log. The catalog body is ~360 KB; reading it directly is what we're avoiding.grepLinesintosrc/helpers/grepLines.tsso both grep tools share one implementation. Old test migrated alongside.Phase 2 — execute backstop with hard read/write/delete tiering
isWriteflag the LLM sets.classifyMethod(method)maps GET →read, POST/PUT/PATCH →write, DELETE →delete.--allow-deletesCLI flag +ToolsetConfig.allowDeletes. DELETE requires both--no-read-onlyAND--allow-deletes— a deliberate two-flag opt-in for irreversible operations.execute.ts, in order: sensitive denylist → tier mode gate → path allowlist by enabled toolset → elicitation (DELETE gets a stronger "IRREVERSIBLE" message) → dispatch → audit-on-stderr./api/users/{id}and/api/spaces/{id}); enforced even with both flags on.pathGlob.tsis shared between allowlist and denylist. Only*and**are wildcards; every other character is escaped, so denylist patterns can't be turned into regex injection.Refactors landed in the same change
compilePathGlobbetween allow/deny (was duplicated).activeToolsetConfiglives undersrc/helpers/(was misplaced underresources/catalog/since execute also reads it).grepLinestest migrated tosrc/helpers/__tests__/;grepTaskLog.tsno longer needs to re-export it.execute.tsuses a closure that captures the start time, so each gate's emit is one line (audit(\"blocked\", \"readOnlyMode\")) instead of six.Out of scope (intentionally — Phase 3)
AIF-360 (JSON schema resources for write payloads) and AIF-367 (
describetool) are deferred. Once this lands, agents havegrep_llms_txtto discover endpoint shapes — adequate as a stopgap.Test plan
npm run build— cleannpm run lint— cleannpx vitest run --exclude '**/*.integration.test.ts'— 137 unit tests pass (60+ new)https://main.testoctopus.app: readoctopus://api/capabilities, rungrep_llms_txtfor'POST /releases', exercise the read tier ofexecutewithGET /api/spaces, verify the readOnlyMode gate blocks a POST, verify the deletesNotAllowed gate blocks a DELETE under--no-read-onlyonly.findInterruptions/listTenantsare unchanged — they fail on a missing 'Octopus Server' test space, unrelated to this work.Coordination
The Linear tickets AIF-359 and AIF-377 describe the work as written before the three-tier DELETE design was confirmed. Worth updating the ticket bodies post-merge to reflect:
grep_llms_txtand thegrepLineshelper extraction (~250 LOC, was ~150).--allow-deletesflag, plus catastrophic-delete entries on the sensitive denylist.🤖 Generated with Claude Code