Skip to content

feat: Add llms.txt catalog and execute backstop with hard read/write/delete gating#80

Merged
egorpavlikhin merged 4 commits intomainfrom
egorpavlikhin/aif-359-llms-txt-catalog-and-grep
May 8, 2026
Merged

feat: Add llms.txt catalog and execute backstop with hard read/write/delete gating#80
egorpavlikhin merged 4 commits intomainfrom
egorpavlikhin/aif-359-llms-txt-catalog-and-grep

Conversation

@egorpavlikhin
Copy link
Copy Markdown
Contributor

Summary

Closes AIF-359 and AIF-377. Two phases shipped together as one progressive change.

Phase 1 — catalog resource + grep tool

  • octopus://api/llms.txt resource with a 5-min in-memory TTL cache, shared with the grep tool so a search doesn't trigger a re-fetch.
  • octopus://api/capabilities resource: server version, enabled toolsets, available tools, feature flags.
  • grep_llms_txt tool — same GNU-grep parameter shape as grep_task_log. The catalog body is ~360 KB; reading it directly is what we're avoiding.
  • Lifted grepLines into src/helpers/grepLines.ts so both grep tools share one implementation. Old test migrated alongside.

Phase 2 — execute backstop with hard read/write/delete tiering

  • HTTP method is the authoritative classifier — never an isWrite flag the LLM sets. classifyMethod(method) maps GET → read, POST/PUT/PATCH → write, DELETE → delete.
  • New --allow-deletes CLI flag + ToolsetConfig.allowDeletes. DELETE requires both --no-read-only AND --allow-deletes — a deliberate two-flag opt-in for irreversible operations.
  • Six gates inside execute.ts, in order: sensitive denylist → tier mode gate → path allowlist by enabled toolset → elicitation (DELETE gets a stronger "IRREVERSIBLE" message) → dispatch → audit-on-stderr.
  • Sensitive denylist covers API-key endpoints and catastrophic deletes (DELETE on /api/users/{id} and /api/spaces/{id}); enforced even with both flags on.
  • Glob engine in pathGlob.ts is shared between allowlist and denylist. Only * and ** are wildcards; every other character is escaped, so denylist patterns can't be turned into regex injection.

Refactors landed in the same change

  • Shared compilePathGlob between allow/deny (was duplicated).
  • activeToolsetConfig lives under src/helpers/ (was misplaced under resources/catalog/ since execute also reads it).
  • grepLines test migrated to src/helpers/__tests__/; grepTaskLog.ts no longer needs to re-export it.
  • Audit emission in execute.ts uses a closure that captures the start time, so each gate's emit is one line (audit(\"blocked\", \"readOnlyMode\")) instead of six.

Out of scope (intentionally — Phase 3)

AIF-360 (JSON schema resources for write payloads) and AIF-367 (describe tool) are deferred. Once this lands, agents have grep_llms_txt to discover endpoint shapes — adequate as a stopgap.

Test plan

  • npm run build — clean
  • npm run lint — clean
  • npx vitest run --exclude '**/*.integration.test.ts' — 137 unit tests pass (60+ new)
  • Live smoke test against https://main.testoctopus.app: read octopus://api/capabilities, run grep_llms_txt for 'POST /releases', exercise the read tier of execute with GET /api/spaces, verify the readOnlyMode gate blocks a POST, verify the deletesNotAllowed gate blocks a DELETE under --no-read-only only.
  • Pre-existing integration test failures (11) for findInterruptions/listTenants are unchanged — they fail on a missing 'Octopus Server' test space, unrelated to this work.

Coordination

The Linear tickets AIF-359 and AIF-377 describe the work as written before the three-tier DELETE design was confirmed. Worth updating the ticket bodies post-merge to reflect:

  • AIF-359: scope grew to include grep_llms_txt and the grepLines helper extraction (~250 LOC, was ~150).
  • AIF-377: three-tier read/write/delete classification with the new --allow-deletes flag, plus catastrophic-delete entries on the sensitive denylist.

🤖 Generated with Claude Code

egorpavlikhin and others added 2 commits May 7, 2026 17:35
…delete gating

Closes AIF-359 and AIF-377 in one progressive change.

Phase 1 — catalog resource + grep tool:
- octopus://api/llms.txt resource (5-minute in-memory TTL cache, shared with
  the grep tool so a search doesn't trigger a re-fetch).
- octopus://api/capabilities resource composing server version, enabled
  toolsets, available tools, and feature flags.
- grep_llms_txt tool with the GNU-grep parameter shape already used by
  grep_task_log. SERVER_INSTRUCTIONS steers agents to grep the catalog
  rather than reading the ~360 KB body.
- grepLines pure function lifted into src/helpers/grepLines.ts; both grep
  tools share one implementation. Tests migrated alongside.

Phase 2 — execute backstop with hard read/write/delete classification:
- HTTP method enum is the authoritative classifier; classifyMethod in
  src/helpers/methodTier.ts maps GET → read, POST/PUT/PATCH → write,
  DELETE → delete. Three-tier semantics, not two.
- New --allow-deletes CLI flag plus ToolsetConfig.allowDeletes; DELETE
  requires both --no-read-only and --allow-deletes (deliberate two-flag
  opt-in for irreversible operations).
- Six gates in execute.ts, in order: sensitive denylist (always-on),
  tier-based mode gate, path allowlist by enabled toolset, elicitation
  on every non-GET (stronger 'IRREVERSIBLE' message for DELETE),
  dispatch, audit-on-stderr.
- Sensitive denylist covers API-key endpoints (any method) and
  catastrophic deletes (DELETE on /api/users/{id} and /api/spaces/{id}),
  enforced even with both flags set.
- Path allowlist is a static toolset → glob map. Conservative initial
  coverage; expand based on real usage. Glob engine in pathGlob.ts is
  shared between allowlist and denylist; only `*` and `**` wildcards,
  every other character escaped so denylist patterns can't be turned
  into regex injection.

60+ new tests added; build, lint, and the full unit suite pass. The 11
pre-existing integration test failures (live testoctopus space lookup) are
unchanged — not caused by this work.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…s flag

- Add an "API Catalog & Backstop" subsection covering grep_llms_txt, the
  execute tool's tier gating, and the catalog resources.
- Update Read-Only Mode with the three-tier read/write/delete classification
  and the new --allow-deletes opt-in for DELETE requests.
- Add --no-read-only and --allow-deletes to "Other command line arguments".
- Update Security Considerations to spell out method-driven gating and the
  sensitive denylist.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@akirayamamoto akirayamamoto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Notes

Nice direction overall. The split between read, write, and delete modes makes
sense, and I like that DELETE has a separate opt-in rather than being bundled
into general write access.

A couple of UX suggestions:

Consider renaming --no-read-only

--no-read-only works, but it asks users to reason through a double negative.
Since the flag enables write operations across the MCP server, I think
--allow-writes would be easier to understand at a glance.

You could keep --no-read-only as an alias for compatibility, but document
--allow-writes as the preferred flag:

--allow-writes
--allow-deletes

That reads more naturally: writes are off by default, and deletes require an
additional opt-in.

Make --allow-deletes clearer when read-only mode is still active

Since --allow-deletes only takes effect when write mode is also enabled, I’d
consider either rejecting this combination at startup:

--allow-deletes

without:

--no-read-only

or printing a clear startup warning.

Otherwise a user may think they have enabled deletes when read-only mode still
blocks them. That is safe behavior, but it could be confusing during setup or
debugging.

Something like this would help:

--allow-deletes was provided, but read-only mode is still enabled. DELETE
requests remain blocked unless write mode is enabled.

Not a blocker from my side, but I think these tweaks would make the safety model
easier to understand.

Additional Codex Findings

Codex found a few issues worth looking at before merge:

1. Path checks should run on a canonical path

execute appears to run the allowlist and denylist checks against the raw path
argument, then passes that path to the API client. Codex found that paths with
dot segments, such as:

/api/spaces/Spaces-1/../../users/me/apikeys

can pass the /api/spaces/** allowlist check while resolving to a different API
path. I’d suggest normalizing or rejecting paths with .., encoded slashes,
backslashes, query strings, or fragments before any safety checks run.

2. /api/spaces/** makes the toolset allowlist too broad

Codex also found that core includes /api/spaces/**. Since core is always
enabled and checked first, this appears to allow all space-scoped endpoints
through execute, even when the owning toolset is disabled.

That makes the documented “toolset filtering as a kill switch” model weaker than
it looks. I’d suggest narrowing core to space discovery paths only, then adding
explicit /api/spaces/{space}/... patterns to each owning toolset.

3. octopus://api/capabilities may describe execute as read-only

Codex found that execute is registered as readOnly: true so it remains
available in read-only mode for GET requests. That makes sense internally, but
the capabilities resource may lead agents to think execute is fully read-only,
even when write mode allows POST/PUT/PATCH.

It may be clearer to expose execute as method-gated, or include effective
tiers such as read, write, and delete.

4. Minor docs issue: audit toolset

Codex noticed a couple of references to an audit toolset in comments/docs, but
that does not seem to be a current toolset. This is minor, but it could confuse
users reading the safety model.

egorpavlikhin and others added 2 commits May 8, 2026 11:36
…te capabilities

Addresses the Codex findings and the operator-warning UX nit from #80 review.

Path canonicalization (Codex #1):
- New validateExecutePath helper rejects '..' segments, backslashes, query
  strings, fragments, double slashes, and percent-encoded slashes (%2F/%5C)
  before any allowlist/denylist runs. Without this, a path like
  /api/spaces/Spaces-1/../../users/me/apikeys could pass the legacy
  /api/spaces/** core allowlist while resolving to a different endpoint
  server-side.
- Wired in as Gate 0 in execute.ts (runs before sensitive denylist) and
  surfaces a structured 'invalidPath' reason on rejection.

Narrower core toolset (Codex #2):
- Dropped /api/spaces/** from core. core now covers /api/spaces (list) and
  /api/spaces/* (single-space metadata) only — every per-resource path
  beneath a space must register against its owning toolset.
- Each toolset now declares BOTH /api/<single>/X and /api/spaces/<single>/X
  prefix forms. Introduced a small spaceScoped() helper to avoid hand-typing
  four patterns per resource family.
- This restores the "toolset filtering as kill switch" promise: disabling
  'certificates' now actually makes /api/spaces/Spaces-1/certificates
  unreachable.

Accurate execute capabilities (Codex #3):
- The CapabilityToolEntry for execute now carries methodGated:true and a
  tiersAvailable array reflecting the current session — ['read'] in
  read-only mode, ['read','write'] with --no-read-only, and the full
  ['read','write','delete'] with both flags. Static read-only tools keep
  the existing readOnly:true and omit the new fields.

Stray 'audit' references (Codex #4):
- Removed the non-existent 'audit' toolset from the pathAllowlist JSDoc,
  the execute tool description, and the README. The example now references
  'certificates', a real toolset.

Operator warning when --allow-deletes is inert (review UX #2):
- src/index.ts emits a clear stderr warning at startup if --allow-deletes
  is passed without --no-read-only, so an operator doesn't think they've
  opened up DELETE while the read-only gate still blocks everything.

Tests: 13 new tests (validateExecutePath unit, execute Gate 0 coverage,
core-narrowing regression, methodGated/tiersAvailable behaviour). 150
unit tests pass; build and lint clean.

Not addressed (intentionally — out of scope, called out as non-blocking
in the review): rename --no-read-only -> --allow-writes alias.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The /api/experimental/llms.txt endpoint shipped in 2026.2.3916. Set the
minimum on grep_llms_txt's tool registration so --list-tools-by-version
reports it correctly, and note the requirement on the catalog resource
description and in the README so older-Octopus users see the version
constraint before they hit a 404 from the upstream API.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@egorpavlikhin egorpavlikhin merged commit 868e4cf into main May 8, 2026
1 check passed
@egorpavlikhin egorpavlikhin deleted the egorpavlikhin/aif-359-llms-txt-catalog-and-grep branch May 8, 2026 01:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants