fix: escalating circuit breaker for doom loops in headless mode by anandgupta42 · Pull Request #658 · AltimateAI/altimate-code

anandgupta42 · 2026-04-06T14:26:22Z

What does this PR do?

Adds an escalating circuit breaker to the doom loop detection system. When doom_loop permission is auto-accepted (headless mode or doom_loop: "allow" config), the per-tool repeat counter previously reset every 30 calls, allowing loops to run indefinitely. Observed in production: 10,943 apply_patch calls and 1,791 todowrite calls in a single 4-hour session.

Escalation levels:

30 calls (hit 1): ask permission (existing behavior, unchanged)
60 calls (hit 2): inject non-synthetic warning the LLM sees + ask permission
90 calls (hit 3): force-stop the session

Additional fixes from 9-model code review:

Level 2 warning uses synthetic: false so the LLM actually sees the "stop looping" instruction
if (blocked) break after the switch to exit the stream loop immediately on force-stop
escalation_level added to doom_loop_detected telemetry event

Type of change

Bug fix (non-breaking change which fixes an issue)

Issue for this PR

Closes #657

How did you verify your code works?

All 46 existing processor tests pass
Marker Guard check passes (bun run script/upstream/analyze.ts --markers --base main --strict)
Turbo typecheck passes (5/5 packages)
9-model consensus code review (Claude, GPT 5.4, Gemini 3.1 Pro, Kimi K2.5, MiniMax M2.7, GLM-5, Qwen 3.6 Plus, DeepSeek V3.2, MiMo-V2-Pro)

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
New and existing unit tests pass locally with my changes

Summary by cubic

Adds an escalating circuit breaker to stop doom loops when doom_loop is auto-accepted. Prevents unbounded tool calls in headless mode and force-stops stuck sessions.

Bug Fixes
- 30 calls: ask permission; 60: ask + inject in-session warning; 90: force-stop.
- Counter resets before the ask; force-stop exits the stream loop immediately.
- Telemetry: doom_loop_detected now records escalation_level and cumulative repeat_count.

^{Written for commit ac01755. Summary will update on new commits.}

Summary by CodeRabbit

Bug Fixes
- Improved detection and handling of recurring tool-call loops with graduated escalation.
- Added in-session warning messages as repeats increase.
- Sessions now automatically halt when repetitive tool calls persist to prevent runaway behavior.
Telemetry
- Telemetry now records escalation level and repeat metrics for detected loop events.

When `doom_loop` permission is auto-accepted (headless mode or config `"allow"`), the per-tool repeat counter resets every 30 calls and loops run indefinitely. Observed: 10,943 `apply_patch` calls in one session. Add escalating circuit breaker: - 1st threshold (30 calls): ask permission (existing behavior) - 2nd threshold (60 calls): inject non-synthetic warning the LLM sees, telling it to change approach - 3rd threshold (90 calls): force-stop the session via `blocked = true` Also: - Add `if (blocked) break` after the switch to exit the stream loop immediately on force-stop (not just the switch) - Add `escalation_level` to `doom_loop_detected` telemetry event for distinguishing ask/warn/stop in analytics Closes #657 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

claude

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

_{Tip: disable this comment in your organization's Code Review settings.}

coderabbitai · 2026-04-06T14:26:35Z

📝 Walkthrough

Walkthrough

Added an optional escalation_level to the doom_loop_detected telemetry event and implemented per-tool doom-loop escalation in the session processor with three tiers: ask, inject warning, and forced session stop that halts further event handling.

Changes

Cohort / File(s)	Summary
Telemetry Schema `packages/opencode/src/altimate/telemetry/index.ts`	Extended the `Telemetry.Event` union: the `type: "doom_loop_detected"` variant now includes an optional `escalation_level?: number`.
Session Escalation Logic `packages/opencode/src/session/processor.ts`	Added per-tool `toolLoopHits` counter, compute `escalation_level` at thresholds, include `escalation_level` and updated `repeat_count` in telemetry, and implement three escalation behaviors: 1) reset counters and ask permission, 2) inject an extra warning text part, 3) create a synthetic warning, set `blocked = true`, reset counters, and exit the inner stream loop immediately to stop further handling.

Sequence Diagram(s)

sequenceDiagram
  participant Agent as Agent
  participant SP as SessionProcessor
  participant Permission as PermissionNext
  participant Telemetry as Telemetry
  participant Stream as FullStream

  Agent->>SP: tool call (e.g., apply_patch)
  SP->>SP: increment toolCallCounts & toolLoopHits, compute escalation_level
  SP->>Telemetry: emit doom_loop_detected (repeat_count, escalation_level)
  SP->>Permission: PermissionNext.ask(...)
  alt escalation_level == 1
    Permission-->>SP: allow
    SP->>Stream: continue handling (reset counters)
  else escalation_level == 2
    Permission-->>SP: allow
    SP->>Stream: inject additional warning text part, then continue
  else escalation_level >= 3
    Permission-->>SP: allow/auto
    SP->>Stream: write synthetic warning text part
    SP->>SP: set blocked = true, reset counters
    SP-->>Stream: break / stop processing further events
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

fix: telemetry improvements from deep AppInsights analysis #587: Extends doom-loop detection and telemetry; modifies per-tool repeat logic and telemetry payload similarly.

Suggested reviewers

mdesmet

Poem

🐰 I counted calls with twitching nose,
A warning hop when danger grows,
A louder thump, then final stop—
No more loops where bytes do hop. 🥕

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is comprehensive, covering what changed, why, verification steps, and checklist items. However, it lacks the required 'PINEAPPLE' identifier for AI-generated contributions at the top.	Add the word 'PINEAPPLE' at the very top of the PR description before any other content, as required by the template for AI-generated contributions.
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically summarizes the main change: adding an escalating circuit breaker to prevent doom loops in headless mode, which directly addresses the core problem and solution.
Linked Issues check	✅ Passed	The PR directly addresses all coding requirements from issue `#657`: implementing escalating circuit breaker with three levels (30/60/90 calls), tracking per-tool escalation, and preventing unbounded loops in headless mode.
Out of Scope Changes check	✅ Passed	All changes are scoped to the doom loop circuit breaker implementation: telemetry type updates, escalation tracking via toolLoopHits counter, and multi-level force-stop logic align with issue `#657` requirements.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/doom-loop-escalation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

cubic-dev-ai

1 issue found across 2 files

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/opencode/src/session/processor.ts">

<violation number="1" location="packages/opencode/src/session/processor.ts:273">
P2: The comment says `synthetic: false` but the property is omitted rather than explicitly set. This works today because `undefined` is falsy, but it's fragile — if `Session.updatePart` or the text-part schema ever defaults `synthetic` to `true`, this warning would silently become invisible to the LLM. Explicitly set `synthetic: false` to match the stated intent and be consistent with the level 3 message (which explicitly sets `synthetic: true`).</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.}

packages/opencode/src/session/processor.ts

Auto-fixed verified findings from centralized code review. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

dev-punia-altimate

Code Review — 2 finding(s)

dev-punia-altimate · 2026-04-06T21:56:37Z

packages/opencode/src/session/processor.ts

+                          time: { start: Date.now(), end: Date.now() },
+                        })
+                        blocked = true
+                        toolCallCounts[value.toolName] = 0


🟡 finding (warning)

At escalation level 3 (force-stop), toolCallCounts[value.toolName] is correctly reset to 0 at line 256, but toolLoopHits[value.toolName] is left at 3. In current code this is harmless because blocked = true causes the session to return "stop" and the processor is not reused. However, the two maps are now permanently out of sync: if any future code path ever calls process() again on the same processor instance after a force-stop (e.g., an error-recovery refactor), the very next threshold hit would read hits=4, immediately triggering another force-stop instead of cycling through the warn phase first. The force-stop branch should also include toolLoopHits[value.toolName] = 0 to keep both maps consistent.

dev-punia-altimate · 2026-04-06T21:56:38Z

packages/opencode/src/session/processor.ts

+                          hits,
+                          sessionID: input.sessionID,
+                        })
+                        await Session.updatePart({


🟡 finding (warning)

At escalation level 2, a warning text part is written to input.assistantMessage.id with synthetic: false. In message-v2.ts toModelMessages(), all type === 'text' parts on assistant messages are included in the LLM context WITHOUT filtering for synthetic (unlike user messages which are filtered at prompt.ts line 648). This means the warning string is permanently stored in the assistant message record and will be re-sent verbatim to the LLM on every future turn for the life of the session, not just the looping turn where it was injected. If the model course-corrects and the session continues for many more turns, every subsequent LLM request includes this warning in the conversation history. The force-stop message at level 3 uses synthetic: true which does not help here since toModelMessages does not filter synthetic on assistant messages. Both messages are affected, but the level-3 message is moot since the session ends. The level-2 case is the real concern.

dev-punia-altimate · 2026-04-06T21:56:40Z

🤖 Behavioral Analysis — 2 Finding(s)

🟡 Warnings (2)

? packages/opencode/src/session/processor.ts:256 [?]
At escalation level 3 (force-stop), toolCallCounts[value.toolName] is correctly reset to 0 at line 256, but toolLoopHits[value.toolName] is left at 3. In current code this is harmless because blocked = true causes the session to return "stop" and the processor is not reused. However, the two maps are now permanently out of sync: if any future code path ever calls process() again on the same processor instance after a force-stop (e.g., an error-recovery refactor), the very next threshold hit would read hits=4, immediately triggering another force-stop instead of cycling through the warn phase first. The force-stop branch should also include toolLoopHits[value.toolName] = 0 to keep both maps consistent.
? packages/opencode/src/session/processor.ts:268 [?]
At escalation level 2, a warning text part is written to input.assistantMessage.id with synthetic: false. In message-v2.ts toModelMessages(), all type === 'text' parts on assistant messages are included in the LLM context WITHOUT filtering for synthetic (unlike user messages which are filtered at prompt.ts line 648). This means the warning string is permanently stored in the assistant message record and will be re-sent verbatim to the LLM on every future turn for the life of the session, not just the looping turn where it was injected. If the model course-corrects and the session continues for many more turns, every subsequent LLM request includes this warning in the conversation history. The force-stop message at level 3 uses synthetic: true which does not help here since toModelMessages does not filter synthetic on assistant messages. Both messages are affected, but the level-3 message is moot since the session ends. The level-2 case is the real concern.

Auto-fixed verified findings from centralized code review. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

dev-punia-altimate · 2026-04-06T21:57:36Z

Auto-fixed 2 finding(s)

toolLoopHits is never reset after force-stop, breaking escalation order on any future reuse
Level-2 warn message written with synthetic:false to assistant message persists into all future LLM context windows

cubic-dev-ai

1 issue found across 1 file (changes from recent commits).

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/opencode/src/session/processor.ts">

<violation number="1" location="packages/opencode/src/session/processor.ts:274">
P1: The level 2 escalation warning should use `synthetic: false` so the LLM actually sees the "stop looping" instruction. With `synthetic: true`, `prompt.ts` filters this part out before building the LLM prompt (lines 648, 795), so the model never sees the warning and cannot change its behavior. This contradicts the PR's stated intent and makes level 2 functionally identical to level 1.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.}

cubic-dev-ai · 2026-04-06T22:06:17Z

packages/opencode/src/session/processor.ts

+                          messageID: input.assistantMessage.id,
+                          sessionID: input.assistantMessage.sessionID,
+                          type: "text",
+                          synthetic: true,


P1: The level 2 escalation warning should use synthetic: false so the LLM actually sees the "stop looping" instruction. With synthetic: true, prompt.ts filters this part out before building the LLM prompt (lines 648, 795), so the model never sees the warning and cannot change its behavior. This contradicts the PR's stated intent and makes level 2 functionally identical to level 1.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At packages/opencode/src/session/processor.ts, line 274: <comment>The level 2 escalation warning should use `synthetic: false` so the LLM actually sees the "stop looping" instruction. With `synthetic: true`, `prompt.ts` filters this part out before building the LLM prompt (lines 648, 795), so the model never sees the warning and cannot change its behavior. This contradicts the PR's stated intent and makes level 2 functionally identical to level 1.</comment> <file context> @@ -270,7 +271,7 @@ export namespace SessionProcessor { sessionID: input.assistantMessage.sessionID, type: "text", - synthetic: false, + synthetic: true, text: `⚠️ altimate-code: \`${value.toolName}\` has been called ${totalCalls} times this session. ` + </file context>

Suggested change

synthetic: true,

synthetic: false,

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/opencode/src/session/processor.ts`:
- Around line 269-280: The warning inserted via Session.updatePart (using
PartID.ascending(), input.assistantMessage.id, input.assistantMessage.sessionID,
value.toolName and totalCalls) is marked synthetic:true which makes it TUI-only
and excluded from LLM replay; change the part creation so the warning is not
synthetic (remove or set synthetic to false) so the message is visible to the
model/auto-accept path, keeping the same text, type:"text" and timestamps.
- Around line 549-550: The early stream break currently uses the shared variable
"blocked" which also represents normal permission/question rejections; introduce
a new boolean "forceStopped" scoped alongside "blocked" and set it only in the
doom-loop hard-stop path (where the code currently sets "blocked" for
force-stop), then change the immediate break condition to test "forceStopped"
(if (forceStopped) break) so normal rejection flows still run the finish-step
bookkeeping in the rest of the stream; ensure "forceStopped" is initialized in
the same scope as "blocked" and is not used elsewhere.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: e622ec63-7ed1-4c57-80b8-f71a0aee0dac

📥 Commits

Reviewing files that changed from the base of the PR and between 558b9fa and ac01755.

📒 Files selected for processing (1)

packages/opencode/src/session/processor.ts

coderabbitai · 2026-04-06T22:08:13Z

packages/opencode/src/session/processor.ts

+                        await Session.updatePart({
+                          id: PartID.ascending(),
+                          messageID: input.assistantMessage.id,
+                          sessionID: input.assistantMessage.sessionID,
+                          type: "text",
+                          synthetic: true,
+                          text:
+                            `⚠️ altimate-code: \`${value.toolName}\` has been called ${totalCalls} times this session. ` +
+                            `You appear to be stuck in a loop. Stop repeating the same approach. ` +
+                            `Either try a fundamentally different strategy or explain to the user what is blocking you. ` +
+                            `The session will be force-stopped if this continues.`,
+                          time: { start: Date.now(), end: Date.now() },


⚠️ Potential issue | 🟠 Major

Make the 60-call warning model-visible.

Line 274 marks this warning as synthetic: true, but Lines 433-435 in the same file describe synthetic text as TUI-only and excluded from replay to the LLM. That makes the warn tier ineffective in the exact headless/auto-accept path this circuit breaker is trying to correct.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@packages/opencode/src/session/processor.ts` around lines 269 - 280, The warning inserted via Session.updatePart (using PartID.ascending(), input.assistantMessage.id, input.assistantMessage.sessionID, value.toolName and totalCalls) is marked synthetic:true which makes it TUI-only and excluded from LLM replay; change the part creation so the warning is not synthetic (remove or set synthetic to false) so the message is visible to the model/auto-accept path, keeping the same text, type:"text" and timestamps.

coderabbitai · 2026-04-06T22:08:13Z

packages/opencode/src/session/processor.ts

+              // altimate_change start — exit stream loop immediately on doom loop force-stop
+              if (blocked) break


⚠️ Potential issue | 🟠 Major

Scope the early stream break to the new hard-stop path only.

blocked is also set on Lines 345-350 for normal permission/question rejections. With this unconditional break, those pre-existing denial paths now short-circuit the rest of the stream too, which can skip the finish-step bookkeeping on Lines 372-460. A dedicated forceStopped flag would preserve the old rejection flow while still halting doom-loop stops immediately.

💡 Suggested change

- let blocked = false + let blocked = false + let forceStopped = false ... - blocked = true + blocked = true + forceStopped = true ... - if (blocked) break + if (forceStopped) break

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@packages/opencode/src/session/processor.ts` around lines 549 - 550, The early stream break currently uses the shared variable "blocked" which also represents normal permission/question rejections; introduce a new boolean "forceStopped" scoped alongside "blocked" and set it only in the doom-loop hard-stop path (where the code currently sets "blocked" for force-stop), then change the immediate break condition to test "forceStopped" (if (forceStopped) break) so normal rejection flows still run the finish-step bookkeeping in the rest of the stream; ensure "forceStopped" is initialized in the same scope as "blocked" and is not used elsewhere.

claude bot reviewed Apr 6, 2026

View reviewed changes

github-actions bot added the contributor label Apr 6, 2026

Merge branch 'main' into fix/doom-loop-escalation

7d9b09d

cubic-dev-ai bot reviewed Apr 6, 2026

View reviewed changes

packages/opencode/src/session/processor.ts Outdated Show resolved Hide resolved

fix: address code review findings

558b9fa

Auto-fixed verified findings from centralized code review. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

dev-punia-altimate reviewed Apr 6, 2026

View reviewed changes

fix: address code review findings

ac01755

Auto-fixed verified findings from centralized code review. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cubic-dev-ai bot reviewed Apr 6, 2026

View reviewed changes

coderabbitai bot reviewed Apr 6, 2026

View reviewed changes

		// altimate_change start — exit stream loop immediately on doom loop force-stop
		if (blocked) break

Conversation

anandgupta42 commented Apr 6, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Type of change

Issue for this PR

How did you verify your code works?

Checklist

Summary by cubic

Summary by CodeRabbit

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

coderabbitai bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (2 warnings)

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dev-punia-altimate left a comment

Choose a reason for hiding this comment

Uh oh!

dev-punia-altimate Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

dev-punia-altimate Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

dev-punia-altimate commented Apr 6, 2026

🤖 Behavioral Analysis — 2 Finding(s)

🟡 Warnings (2)

Uh oh!

dev-punia-altimate commented Apr 6, 2026

Auto-fixed 2 finding(s)

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

anandgupta42 commented Apr 6, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 6, 2026 •

edited

Loading

cubic-dev-ai bot Apr 6, 2026 •

edited

Loading