Skip to content

fix: escalating circuit breaker for doom loops in headless mode#658

Open
anandgupta42 wants to merge 4 commits intomainfrom
fix/doom-loop-escalation
Open

fix: escalating circuit breaker for doom loops in headless mode#658
anandgupta42 wants to merge 4 commits intomainfrom
fix/doom-loop-escalation

Conversation

@anandgupta42
Copy link
Copy Markdown
Contributor

@anandgupta42 anandgupta42 commented Apr 6, 2026

What does this PR do?

Adds an escalating circuit breaker to the doom loop detection system. When doom_loop permission is auto-accepted (headless mode or doom_loop: "allow" config), the per-tool repeat counter previously reset every 30 calls, allowing loops to run indefinitely. Observed in production: 10,943 apply_patch calls and 1,791 todowrite calls in a single 4-hour session.

Escalation levels:

  • 30 calls (hit 1): ask permission (existing behavior, unchanged)
  • 60 calls (hit 2): inject non-synthetic warning the LLM sees + ask permission
  • 90 calls (hit 3): force-stop the session

Additional fixes from 9-model code review:

  • Level 2 warning uses synthetic: false so the LLM actually sees the "stop looping" instruction
  • if (blocked) break after the switch to exit the stream loop immediately on force-stop
  • escalation_level added to doom_loop_detected telemetry event

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Issue for this PR

Closes #657

How did you verify your code works?

  • All 46 existing processor tests pass
  • Marker Guard check passes (bun run script/upstream/analyze.ts --markers --base main --strict)
  • Turbo typecheck passes (5/5 packages)
  • 9-model consensus code review (Claude, GPT 5.4, Gemini 3.1 Pro, Kimi K2.5, MiniMax M2.7, GLM-5, Qwen 3.6 Plus, DeepSeek V3.2, MiMo-V2-Pro)

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • New and existing unit tests pass locally with my changes

Summary by cubic

Adds an escalating circuit breaker to stop doom loops when doom_loop is auto-accepted. Prevents unbounded tool calls in headless mode and force-stops stuck sessions.

  • Bug Fixes
    • 30 calls: ask permission; 60: ask + inject in-session warning; 90: force-stop.
    • Counter resets before the ask; force-stop exits the stream loop immediately.
    • Telemetry: doom_loop_detected now records escalation_level and cumulative repeat_count.

Written for commit ac01755. Summary will update on new commits.

Summary by CodeRabbit

  • Bug Fixes

    • Improved detection and handling of recurring tool-call loops with graduated escalation.
    • Added in-session warning messages as repeats increase.
    • Sessions now automatically halt when repetitive tool calls persist to prevent runaway behavior.
  • Telemetry

    • Telemetry now records escalation level and repeat metrics for detected loop events.

When `doom_loop` permission is auto-accepted (headless mode or config
`"allow"`), the per-tool repeat counter resets every 30 calls and loops
run indefinitely. Observed: 10,943 `apply_patch` calls in one session.

Add escalating circuit breaker:
- 1st threshold (30 calls): ask permission (existing behavior)
- 2nd threshold (60 calls): inject non-synthetic warning the LLM sees,
  telling it to change approach
- 3rd threshold (90 calls): force-stop the session via `blocked = true`

Also:
- Add `if (blocked) break` after the switch to exit the stream loop
  immediately on force-stop (not just the switch)
- Add `escalation_level` to `doom_loop_detected` telemetry event for
  distinguishing ask/warn/stop in analytics

Closes #657

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

Tip: disable this comment in your organization's Code Review settings.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 6, 2026

📝 Walkthrough

Walkthrough

Added an optional escalation_level to the doom_loop_detected telemetry event and implemented per-tool doom-loop escalation in the session processor with three tiers: ask, inject warning, and forced session stop that halts further event handling.

Changes

Cohort / File(s) Summary
Telemetry Schema
packages/opencode/src/altimate/telemetry/index.ts
Extended the Telemetry.Event union: the type: "doom_loop_detected" variant now includes an optional escalation_level?: number.
Session Escalation Logic
packages/opencode/src/session/processor.ts
Added per-tool toolLoopHits counter, compute escalation_level at thresholds, include escalation_level and updated repeat_count in telemetry, and implement three escalation behaviors: 1) reset counters and ask permission, 2) inject an extra warning text part, 3) create a synthetic warning, set blocked = true, reset counters, and exit the inner stream loop immediately to stop further handling.

Sequence Diagram(s)

sequenceDiagram
  participant Agent as Agent
  participant SP as SessionProcessor
  participant Permission as PermissionNext
  participant Telemetry as Telemetry
  participant Stream as FullStream

  Agent->>SP: tool call (e.g., apply_patch)
  SP->>SP: increment toolCallCounts & toolLoopHits, compute escalation_level
  SP->>Telemetry: emit doom_loop_detected (repeat_count, escalation_level)
  SP->>Permission: PermissionNext.ask(...)
  alt escalation_level == 1
    Permission-->>SP: allow
    SP->>Stream: continue handling (reset counters)
  else escalation_level == 2
    Permission-->>SP: allow
    SP->>Stream: inject additional warning text part, then continue
  else escalation_level >= 3
    Permission-->>SP: allow/auto
    SP->>Stream: write synthetic warning text part
    SP->>SP: set blocked = true, reset counters
    SP-->>Stream: break / stop processing further events
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • mdesmet

Poem

🐰 I counted calls with twitching nose,
A warning hop when danger grows,
A louder thump, then final stop—
No more loops where bytes do hop. 🥕

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is comprehensive, covering what changed, why, verification steps, and checklist items. However, it lacks the required 'PINEAPPLE' identifier for AI-generated contributions at the top. Add the word 'PINEAPPLE' at the very top of the PR description before any other content, as required by the template for AI-generated contributions.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the main change: adding an escalating circuit breaker to prevent doom loops in headless mode, which directly addresses the core problem and solution.
Linked Issues check ✅ Passed The PR directly addresses all coding requirements from issue #657: implementing escalating circuit breaker with three levels (30/60/90 calls), tracking per-tool escalation, and preventing unbounded loops in headless mode.
Out of Scope Changes check ✅ Passed All changes are scoped to the doom loop circuit breaker implementation: telemetry type updates, escalation tracking via toolLoopHits counter, and multi-level force-stop logic align with issue #657 requirements.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/doom-loop-escalation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/opencode/src/session/processor.ts">

<violation number="1" location="packages/opencode/src/session/processor.ts:273">
P2: The comment says `synthetic: false` but the property is omitted rather than explicitly set. This works today because `undefined` is falsy, but it's fragile — if `Session.updatePart` or the text-part schema ever defaults `synthetic` to `true`, this warning would silently become invisible to the LLM. Explicitly set `synthetic: false` to match the stated intent and be consistent with the level 3 message (which explicitly sets `synthetic: true`).</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.

Auto-fixed verified findings from centralized code review.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@dev-punia-altimate dev-punia-altimate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — 2 finding(s)

time: { start: Date.now(), end: Date.now() },
})
blocked = true
toolCallCounts[value.toolName] = 0
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 finding (warning)

At escalation level 3 (force-stop), toolCallCounts[value.toolName] is correctly reset to 0 at line 256, but toolLoopHits[value.toolName] is left at 3. In current code this is harmless because blocked = true causes the session to return "stop" and the processor is not reused. However, the two maps are now permanently out of sync: if any future code path ever calls process() again on the same processor instance after a force-stop (e.g., an error-recovery refactor), the very next threshold hit would read hits=4, immediately triggering another force-stop instead of cycling through the warn phase first. The force-stop branch should also include toolLoopHits[value.toolName] = 0 to keep both maps consistent.

hits,
sessionID: input.sessionID,
})
await Session.updatePart({
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 finding (warning)

At escalation level 2, a warning text part is written to input.assistantMessage.id with synthetic: false. In message-v2.ts toModelMessages(), all type === 'text' parts on assistant messages are included in the LLM context WITHOUT filtering for synthetic (unlike user messages which are filtered at prompt.ts line 648). This means the warning string is permanently stored in the assistant message record and will be re-sent verbatim to the LLM on every future turn for the life of the session, not just the looping turn where it was injected. If the model course-corrects and the session continues for many more turns, every subsequent LLM request includes this warning in the conversation history. The force-stop message at level 3 uses synthetic: true which does not help here since toModelMessages does not filter synthetic on assistant messages. Both messages are affected, but the level-3 message is moot since the session ends. The level-2 case is the real concern.

@dev-punia-altimate
Copy link
Copy Markdown

🤖 Behavioral Analysis — 2 Finding(s)

🟡 Warnings (2)

  • ? packages/opencode/src/session/processor.ts:256 [?]
    At escalation level 3 (force-stop), toolCallCounts[value.toolName] is correctly reset to 0 at line 256, but toolLoopHits[value.toolName] is left at 3. In current code this is harmless because blocked = true causes the session to return "stop" and the processor is not reused. However, the two maps are now permanently out of sync: if any future code path ever calls process() again on the same processor instance after a force-stop (e.g., an error-recovery refactor), the very next threshold hit would read hits=4, immediately triggering another force-stop instead of cycling through the warn phase first. The force-stop branch should also include toolLoopHits[value.toolName] = 0 to keep both maps consistent.

  • ? packages/opencode/src/session/processor.ts:268 [?]
    At escalation level 2, a warning text part is written to input.assistantMessage.id with synthetic: false. In message-v2.ts toModelMessages(), all type === 'text' parts on assistant messages are included in the LLM context WITHOUT filtering for synthetic (unlike user messages which are filtered at prompt.ts line 648). This means the warning string is permanently stored in the assistant message record and will be re-sent verbatim to the LLM on every future turn for the life of the session, not just the looping turn where it was injected. If the model course-corrects and the session continues for many more turns, every subsequent LLM request includes this warning in the conversation history. The force-stop message at level 3 uses synthetic: true which does not help here since toModelMessages does not filter synthetic on assistant messages. Both messages are affected, but the level-3 message is moot since the session ends. The level-2 case is the real concern.

Auto-fixed verified findings from centralized code review.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@dev-punia-altimate
Copy link
Copy Markdown

Auto-fixed 2 finding(s)

  • toolLoopHits is never reset after force-stop, breaking escalation order on any future reuse
  • Level-2 warn message written with synthetic:false to assistant message persists into all future LLM context windows

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 1 file (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/opencode/src/session/processor.ts">

<violation number="1" location="packages/opencode/src/session/processor.ts:274">
P1: The level 2 escalation warning should use `synthetic: false` so the LLM actually sees the "stop looping" instruction. With `synthetic: true`, `prompt.ts` filters this part out before building the LLM prompt (lines 648, 795), so the model never sees the warning and cannot change its behavior. This contradicts the PR's stated intent and makes level 2 functionally identical to level 1.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.

messageID: input.assistantMessage.id,
sessionID: input.assistantMessage.sessionID,
type: "text",
synthetic: true,
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: The level 2 escalation warning should use synthetic: false so the LLM actually sees the "stop looping" instruction. With synthetic: true, prompt.ts filters this part out before building the LLM prompt (lines 648, 795), so the model never sees the warning and cannot change its behavior. This contradicts the PR's stated intent and makes level 2 functionally identical to level 1.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/opencode/src/session/processor.ts, line 274:

<comment>The level 2 escalation warning should use `synthetic: false` so the LLM actually sees the "stop looping" instruction. With `synthetic: true`, `prompt.ts` filters this part out before building the LLM prompt (lines 648, 795), so the model never sees the warning and cannot change its behavior. This contradicts the PR's stated intent and makes level 2 functionally identical to level 1.</comment>

<file context>
@@ -270,7 +271,7 @@ export namespace SessionProcessor {
                           sessionID: input.assistantMessage.sessionID,
                           type: "text",
-                          synthetic: false,
+                          synthetic: true,
                           text:
                             `⚠️ altimate-code: \`${value.toolName}\` has been called ${totalCalls} times this session. ` +
</file context>
Suggested change
synthetic: true,
synthetic: false,
Fix with Cubic

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/opencode/src/session/processor.ts`:
- Around line 269-280: The warning inserted via Session.updatePart (using
PartID.ascending(), input.assistantMessage.id, input.assistantMessage.sessionID,
value.toolName and totalCalls) is marked synthetic:true which makes it TUI-only
and excluded from LLM replay; change the part creation so the warning is not
synthetic (remove or set synthetic to false) so the message is visible to the
model/auto-accept path, keeping the same text, type:"text" and timestamps.
- Around line 549-550: The early stream break currently uses the shared variable
"blocked" which also represents normal permission/question rejections; introduce
a new boolean "forceStopped" scoped alongside "blocked" and set it only in the
doom-loop hard-stop path (where the code currently sets "blocked" for
force-stop), then change the immediate break condition to test "forceStopped"
(if (forceStopped) break) so normal rejection flows still run the finish-step
bookkeeping in the rest of the stream; ensure "forceStopped" is initialized in
the same scope as "blocked" and is not used elsewhere.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: e622ec63-7ed1-4c57-80b8-f71a0aee0dac

📥 Commits

Reviewing files that changed from the base of the PR and between 558b9fa and ac01755.

📒 Files selected for processing (1)
  • packages/opencode/src/session/processor.ts

Comment on lines +269 to +280
await Session.updatePart({
id: PartID.ascending(),
messageID: input.assistantMessage.id,
sessionID: input.assistantMessage.sessionID,
type: "text",
synthetic: true,
text:
`⚠️ altimate-code: \`${value.toolName}\` has been called ${totalCalls} times this session. ` +
`You appear to be stuck in a loop. Stop repeating the same approach. ` +
`Either try a fundamentally different strategy or explain to the user what is blocking you. ` +
`The session will be force-stopped if this continues.`,
time: { start: Date.now(), end: Date.now() },
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Make the 60-call warning model-visible.

Line 274 marks this warning as synthetic: true, but Lines 433-435 in the same file describe synthetic text as TUI-only and excluded from replay to the LLM. That makes the warn tier ineffective in the exact headless/auto-accept path this circuit breaker is trying to correct.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opencode/src/session/processor.ts` around lines 269 - 280, The
warning inserted via Session.updatePart (using PartID.ascending(),
input.assistantMessage.id, input.assistantMessage.sessionID, value.toolName and
totalCalls) is marked synthetic:true which makes it TUI-only and excluded from
LLM replay; change the part creation so the warning is not synthetic (remove or
set synthetic to false) so the message is visible to the model/auto-accept path,
keeping the same text, type:"text" and timestamps.

Comment on lines +549 to +550
// altimate_change start — exit stream loop immediately on doom loop force-stop
if (blocked) break
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Scope the early stream break to the new hard-stop path only.

blocked is also set on Lines 345-350 for normal permission/question rejections. With this unconditional break, those pre-existing denial paths now short-circuit the rest of the stream too, which can skip the finish-step bookkeeping on Lines 372-460. A dedicated forceStopped flag would preserve the old rejection flow while still halting doom-loop stops immediately.

💡 Suggested change
-    let blocked = false
+    let blocked = false
+    let forceStopped = false
...
-                        blocked = true
+                        blocked = true
+                        forceStopped = true
...
-              if (blocked) break
+              if (forceStopped) break
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opencode/src/session/processor.ts` around lines 549 - 550, The early
stream break currently uses the shared variable "blocked" which also represents
normal permission/question rejections; introduce a new boolean "forceStopped"
scoped alongside "blocked" and set it only in the doom-loop hard-stop path
(where the code currently sets "blocked" for force-stop), then change the
immediate break condition to test "forceStopped" (if (forceStopped) break) so
normal rejection flows still run the finish-step bookkeeping in the rest of the
stream; ensure "forceStopped" is initialized in the same scope as "blocked" and
is not used elsewhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Doom loop circuit breaker allows infinite loops in headless mode

2 participants