fix: escalating circuit breaker for doom loops in headless mode#658
fix: escalating circuit breaker for doom loops in headless mode#658anandgupta42 wants to merge 4 commits intomainfrom
Conversation
When `doom_loop` permission is auto-accepted (headless mode or config `"allow"`), the per-tool repeat counter resets every 30 calls and loops run indefinitely. Observed: 10,943 `apply_patch` calls in one session. Add escalating circuit breaker: - 1st threshold (30 calls): ask permission (existing behavior) - 2nd threshold (60 calls): inject non-synthetic warning the LLM sees, telling it to change approach - 3rd threshold (90 calls): force-stop the session via `blocked = true` Also: - Add `if (blocked) break` after the switch to exit the stream loop immediately on force-stop (not just the switch) - Add `escalation_level` to `doom_loop_detected` telemetry event for distinguishing ask/warn/stop in analytics Closes #657 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
📝 WalkthroughWalkthroughAdded an optional Changes
Sequence Diagram(s)sequenceDiagram
participant Agent as Agent
participant SP as SessionProcessor
participant Permission as PermissionNext
participant Telemetry as Telemetry
participant Stream as FullStream
Agent->>SP: tool call (e.g., apply_patch)
SP->>SP: increment toolCallCounts & toolLoopHits, compute escalation_level
SP->>Telemetry: emit doom_loop_detected (repeat_count, escalation_level)
SP->>Permission: PermissionNext.ask(...)
alt escalation_level == 1
Permission-->>SP: allow
SP->>Stream: continue handling (reset counters)
else escalation_level == 2
Permission-->>SP: allow
SP->>Stream: inject additional warning text part, then continue
else escalation_level >= 3
Permission-->>SP: allow/auto
SP->>Stream: write synthetic warning text part
SP->>SP: set blocked = true, reset counters
SP-->>Stream: break / stop processing further events
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
1 issue found across 2 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/opencode/src/session/processor.ts">
<violation number="1" location="packages/opencode/src/session/processor.ts:273">
P2: The comment says `synthetic: false` but the property is omitted rather than explicitly set. This works today because `undefined` is falsy, but it's fragile — if `Session.updatePart` or the text-part schema ever defaults `synthetic` to `true`, this warning would silently become invisible to the LLM. Explicitly set `synthetic: false` to match the stated intent and be consistent with the level 3 message (which explicitly sets `synthetic: true`).</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.
Auto-fixed verified findings from centralized code review. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
dev-punia-altimate
left a comment
There was a problem hiding this comment.
Code Review — 2 finding(s)
| time: { start: Date.now(), end: Date.now() }, | ||
| }) | ||
| blocked = true | ||
| toolCallCounts[value.toolName] = 0 |
There was a problem hiding this comment.
🟡 finding (warning)
At escalation level 3 (force-stop), toolCallCounts[value.toolName] is correctly reset to 0 at line 256, but toolLoopHits[value.toolName] is left at 3. In current code this is harmless because blocked = true causes the session to return "stop" and the processor is not reused. However, the two maps are now permanently out of sync: if any future code path ever calls process() again on the same processor instance after a force-stop (e.g., an error-recovery refactor), the very next threshold hit would read hits=4, immediately triggering another force-stop instead of cycling through the warn phase first. The force-stop branch should also include toolLoopHits[value.toolName] = 0 to keep both maps consistent.
| hits, | ||
| sessionID: input.sessionID, | ||
| }) | ||
| await Session.updatePart({ |
There was a problem hiding this comment.
🟡 finding (warning)
At escalation level 2, a warning text part is written to input.assistantMessage.id with synthetic: false. In message-v2.ts toModelMessages(), all type === 'text' parts on assistant messages are included in the LLM context WITHOUT filtering for synthetic (unlike user messages which are filtered at prompt.ts line 648). This means the warning string is permanently stored in the assistant message record and will be re-sent verbatim to the LLM on every future turn for the life of the session, not just the looping turn where it was injected. If the model course-corrects and the session continues for many more turns, every subsequent LLM request includes this warning in the conversation history. The force-stop message at level 3 uses synthetic: true which does not help here since toModelMessages does not filter synthetic on assistant messages. Both messages are affected, but the level-3 message is moot since the session ends. The level-2 case is the real concern.
🤖 Behavioral Analysis — 2 Finding(s)🟡 Warnings (2)
|
Auto-fixed verified findings from centralized code review. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Auto-fixed 2 finding(s)
|
There was a problem hiding this comment.
1 issue found across 1 file (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/opencode/src/session/processor.ts">
<violation number="1" location="packages/opencode/src/session/processor.ts:274">
P1: The level 2 escalation warning should use `synthetic: false` so the LLM actually sees the "stop looping" instruction. With `synthetic: true`, `prompt.ts` filters this part out before building the LLM prompt (lines 648, 795), so the model never sees the warning and cannot change its behavior. This contradicts the PR's stated intent and makes level 2 functionally identical to level 1.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.
| messageID: input.assistantMessage.id, | ||
| sessionID: input.assistantMessage.sessionID, | ||
| type: "text", | ||
| synthetic: true, |
There was a problem hiding this comment.
P1: The level 2 escalation warning should use synthetic: false so the LLM actually sees the "stop looping" instruction. With synthetic: true, prompt.ts filters this part out before building the LLM prompt (lines 648, 795), so the model never sees the warning and cannot change its behavior. This contradicts the PR's stated intent and makes level 2 functionally identical to level 1.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/opencode/src/session/processor.ts, line 274:
<comment>The level 2 escalation warning should use `synthetic: false` so the LLM actually sees the "stop looping" instruction. With `synthetic: true`, `prompt.ts` filters this part out before building the LLM prompt (lines 648, 795), so the model never sees the warning and cannot change its behavior. This contradicts the PR's stated intent and makes level 2 functionally identical to level 1.</comment>
<file context>
@@ -270,7 +271,7 @@ export namespace SessionProcessor {
sessionID: input.assistantMessage.sessionID,
type: "text",
- synthetic: false,
+ synthetic: true,
text:
`⚠️ altimate-code: \`${value.toolName}\` has been called ${totalCalls} times this session. ` +
</file context>
| synthetic: true, | |
| synthetic: false, |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/opencode/src/session/processor.ts`:
- Around line 269-280: The warning inserted via Session.updatePart (using
PartID.ascending(), input.assistantMessage.id, input.assistantMessage.sessionID,
value.toolName and totalCalls) is marked synthetic:true which makes it TUI-only
and excluded from LLM replay; change the part creation so the warning is not
synthetic (remove or set synthetic to false) so the message is visible to the
model/auto-accept path, keeping the same text, type:"text" and timestamps.
- Around line 549-550: The early stream break currently uses the shared variable
"blocked" which also represents normal permission/question rejections; introduce
a new boolean "forceStopped" scoped alongside "blocked" and set it only in the
doom-loop hard-stop path (where the code currently sets "blocked" for
force-stop), then change the immediate break condition to test "forceStopped"
(if (forceStopped) break) so normal rejection flows still run the finish-step
bookkeeping in the rest of the stream; ensure "forceStopped" is initialized in
the same scope as "blocked" and is not used elsewhere.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: e622ec63-7ed1-4c57-80b8-f71a0aee0dac
📒 Files selected for processing (1)
packages/opencode/src/session/processor.ts
| await Session.updatePart({ | ||
| id: PartID.ascending(), | ||
| messageID: input.assistantMessage.id, | ||
| sessionID: input.assistantMessage.sessionID, | ||
| type: "text", | ||
| synthetic: true, | ||
| text: | ||
| `⚠️ altimate-code: \`${value.toolName}\` has been called ${totalCalls} times this session. ` + | ||
| `You appear to be stuck in a loop. Stop repeating the same approach. ` + | ||
| `Either try a fundamentally different strategy or explain to the user what is blocking you. ` + | ||
| `The session will be force-stopped if this continues.`, | ||
| time: { start: Date.now(), end: Date.now() }, |
There was a problem hiding this comment.
Make the 60-call warning model-visible.
Line 274 marks this warning as synthetic: true, but Lines 433-435 in the same file describe synthetic text as TUI-only and excluded from replay to the LLM. That makes the warn tier ineffective in the exact headless/auto-accept path this circuit breaker is trying to correct.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/opencode/src/session/processor.ts` around lines 269 - 280, The
warning inserted via Session.updatePart (using PartID.ascending(),
input.assistantMessage.id, input.assistantMessage.sessionID, value.toolName and
totalCalls) is marked synthetic:true which makes it TUI-only and excluded from
LLM replay; change the part creation so the warning is not synthetic (remove or
set synthetic to false) so the message is visible to the model/auto-accept path,
keeping the same text, type:"text" and timestamps.
| // altimate_change start — exit stream loop immediately on doom loop force-stop | ||
| if (blocked) break |
There was a problem hiding this comment.
Scope the early stream break to the new hard-stop path only.
blocked is also set on Lines 345-350 for normal permission/question rejections. With this unconditional break, those pre-existing denial paths now short-circuit the rest of the stream too, which can skip the finish-step bookkeeping on Lines 372-460. A dedicated forceStopped flag would preserve the old rejection flow while still halting doom-loop stops immediately.
💡 Suggested change
- let blocked = false
+ let blocked = false
+ let forceStopped = false
...
- blocked = true
+ blocked = true
+ forceStopped = true
...
- if (blocked) break
+ if (forceStopped) break🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/opencode/src/session/processor.ts` around lines 549 - 550, The early
stream break currently uses the shared variable "blocked" which also represents
normal permission/question rejections; introduce a new boolean "forceStopped"
scoped alongside "blocked" and set it only in the doom-loop hard-stop path
(where the code currently sets "blocked" for force-stop), then change the
immediate break condition to test "forceStopped" (if (forceStopped) break) so
normal rejection flows still run the finish-step bookkeeping in the rest of the
stream; ensure "forceStopped" is initialized in the same scope as "blocked" and
is not used elsewhere.
What does this PR do?
Adds an escalating circuit breaker to the doom loop detection system. When
doom_looppermission is auto-accepted (headless mode ordoom_loop: "allow"config), the per-tool repeat counter previously reset every 30 calls, allowing loops to run indefinitely. Observed in production: 10,943apply_patchcalls and 1,791todowritecalls in a single 4-hour session.Escalation levels:
Additional fixes from 9-model code review:
synthetic: falseso the LLM actually sees the "stop looping" instructionif (blocked) breakafter the switch to exit the stream loop immediately on force-stopescalation_leveladded todoom_loop_detectedtelemetry eventType of change
Issue for this PR
Closes #657
How did you verify your code works?
bun run script/upstream/analyze.ts --markers --base main --strict)Checklist
Summary by cubic
Adds an escalating circuit breaker to stop doom loops when
doom_loopis auto-accepted. Prevents unbounded tool calls in headless mode and force-stops stuck sessions.doom_loop_detectednow recordsescalation_leveland cumulativerepeat_count.Written for commit ac01755. Summary will update on new commits.
Summary by CodeRabbit
Bug Fixes
Telemetry