Skip to content

fix: Isolate thread reactor workers to prevent head-of-line blocking#1945

Open
ashvinnihalani wants to merge 1 commit intopingdotgg:mainfrom
ashvinnihalani:t3code/per-thread-workers
Open

fix: Isolate thread reactor workers to prevent head-of-line blocking#1945
ashvinnihalani wants to merge 1 commit intopingdotgg:mainfrom
ashvinnihalani:t3code/per-thread-workers

Conversation

@ashvinnihalani
Copy link
Copy Markdown
Contributor

@ashvinnihalani ashvinnihalani commented Apr 12, 2026

What Changed

Users can keep working in healthy threads even if another thread gets stuck starting a session. This replaces the single provider command reactor worker with per-thread drainable workers so provider intents are queued independently by threadId while preserving per-thread ordering, and it keeps the hanging-thread regression coverage with a test that verifies a blocked thread-1 start does not prevent thread-2 from reaching sendTurn.

Why

A hung or slow thread start could block the shared reactor queue and make unrelated threads look broken. The failure scenario here is one thread stalling during session startup or resume, followed by another thread trying to start a turn and getting stuck behind the same worker. Isolating reactor workers by thread removes that head-of-line blocking, keeps unaffected threads responsive, and preserves serialized processing within each individual thread.

UI Changes

Checklist

  • This PR is small and focused
  • I explained what changed and why
  • I included before/after screenshots for any UI changes
  • I included a video for animation/interaction changes

Note

Medium Risk
Moderate risk: changes the reactor’s event-queueing/concurrency model, which could affect ordering, draining behavior, and resource usage across threads if worker lifecycle isn’t handled correctly.

Overview
Prevents head-of-line blocking in ProviderCommandReactor by replacing the single DrainableWorker with a per-thread worker map keyed by threadId, so provider-intent events are queued/serialized per thread but processed independently across threads.

Updates drain to wait on all active thread workers, and extends the test harness to allow custom startSession behavior plus a new regression test proving a hung session start in thread-1 does not stop thread-2 from reaching sendTurn.

Reviewed by Cursor Bugbot for commit bde0e2a. Bugbot is set up for automated code reviews on this repo. Configure here.

Note

Isolate ProviderCommandReactor thread workers to prevent head-of-line blocking

  • Replaces a single global DrainableWorker with a per-thread worker map in ProviderCommandReactor.ts, keyed by ThreadId.
  • Workers are lazily created and memoized via a workerForThread effect; the drain method now drains all per-thread workers concurrently.
  • Adds a test in ProviderCommandReactor.test.ts that verifies a hung session start on one thread does not block turn processing on another.
  • Behavioral Change: events that previously queued behind a blocked thread will now process independently.

Macroscope summarized bde0e2a.

- Route provider intents through per-thread drainable workers
- Add a regression test covering a hung thread start alongside a healthy thread

(cherry picked from commit 8683d94fb6299754a81e9caff9c39dab7ed474ca)
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 12, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: bbd5b17e-7e7b-468f-ad11-9bdcc9a8971d

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added size:S 10-29 changed lines (additions + deletions). vouch:unvouched PR author is not yet trusted in the VOUCHED list. labels Apr 12, 2026
@macroscopeapp
Copy link
Copy Markdown
Contributor

macroscopeapp bot commented Apr 12, 2026

Approvability

Verdict: Needs human review

This PR changes the concurrency model from a single shared worker to per-thread workers, which is a significant runtime behavior change. While the fix is well-tested and the scope is contained, the modification to event processing orchestration warrants human review to validate the approach and check for potential issues like unbounded worker map growth.

You can customize Macroscope's approvability policy. Learn more.

);

const worker = yield* makeDrainableWorker(processDomainEventSafely);
const workersByThreadId = new Map<ThreadId, DrainableWorker<ProviderIntentEvent>>();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this map is ever growing, workers are never torn down?

Copy link
Copy Markdown
Contributor Author

@ashvinnihalani ashvinnihalani Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good point, I have never reached a point where the workers threads have grown that much since I regulalry rebuild and clear the thread history. Let me make sure when either we send a session_close id or something similar we remove delete the thread form the map

Copy link
Copy Markdown
Contributor Author

@ashvinnihalani ashvinnihalani Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this isn't the right abstraction here. I made this because I ran into a very specific use case where I wanted to start a new thread on the existing thread was still starting/hanging. Given that we have a DurableWorker for the ProviderCommandReactor, ProviderRuntimeIngestion, CheckpointReactor it may make more sense to instead make DurableWorker non-blocking for seperate thread events

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly not sure how much punch the workers are pulling anymore. They did a good job guaranteeing order before when stuff were more async. Now we're more bought into the effect runtime so I think some of the issues the worker aimed to solve went away so I don't think we need these workers in every place we currently have them anymore.

I have some branches with some performance work that I got sidetracked from finishing. Will see if I can dig those up

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:S 10-29 changed lines (additions + deletions). vouch:unvouched PR author is not yet trusted in the VOUCHED list.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants