fix: Isolate thread reactor workers to prevent head-of-line blocking#1945
fix: Isolate thread reactor workers to prevent head-of-line blocking#1945ashvinnihalani wants to merge 1 commit intopingdotgg:mainfrom
Conversation
- Route provider intents through per-thread drainable workers - Add a regression test covering a hung thread start alongside a healthy thread (cherry picked from commit 8683d94fb6299754a81e9caff9c39dab7ed474ca)
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
ApprovabilityVerdict: Needs human review This PR changes the concurrency model from a single shared worker to per-thread workers, which is a significant runtime behavior change. While the fix is well-tested and the scope is contained, the modification to event processing orchestration warrants human review to validate the approach and check for potential issues like unbounded worker map growth. You can customize Macroscope's approvability policy. Learn more. |
| ); | ||
|
|
||
| const worker = yield* makeDrainableWorker(processDomainEventSafely); | ||
| const workersByThreadId = new Map<ThreadId, DrainableWorker<ProviderIntentEvent>>(); |
There was a problem hiding this comment.
this map is ever growing, workers are never torn down?
There was a problem hiding this comment.
Yeah good point, I have never reached a point where the workers threads have grown that much since I regulalry rebuild and clear the thread history. Let me make sure when either we send a session_close id or something similar we remove delete the thread form the map
There was a problem hiding this comment.
Maybe this isn't the right abstraction here. I made this because I ran into a very specific use case where I wanted to start a new thread on the existing thread was still starting/hanging. Given that we have a DurableWorker for the ProviderCommandReactor, ProviderRuntimeIngestion, CheckpointReactor it may make more sense to instead make DurableWorker non-blocking for seperate thread events
There was a problem hiding this comment.
Honestly not sure how much punch the workers are pulling anymore. They did a good job guaranteeing order before when stuff were more async. Now we're more bought into the effect runtime so I think some of the issues the worker aimed to solve went away so I don't think we need these workers in every place we currently have them anymore.
I have some branches with some performance work that I got sidetracked from finishing. Will see if I can dig those up
What Changed
Users can keep working in healthy threads even if another thread gets stuck starting a session. This replaces the single provider command reactor worker with per-thread drainable workers so provider intents are queued independently by threadId while preserving per-thread ordering, and it keeps the hanging-thread regression coverage with a test that verifies a blocked thread-1 start does not prevent thread-2 from reaching sendTurn.
Why
A hung or slow thread start could block the shared reactor queue and make unrelated threads look broken. The failure scenario here is one thread stalling during session startup or resume, followed by another thread trying to start a turn and getting stuck behind the same worker. Isolating reactor workers by thread removes that head-of-line blocking, keeps unaffected threads responsive, and preserves serialized processing within each individual thread.
UI Changes
Checklist
Note
Medium Risk
Moderate risk: changes the reactor’s event-queueing/concurrency model, which could affect ordering, draining behavior, and resource usage across threads if worker lifecycle isn’t handled correctly.
Overview
Prevents head-of-line blocking in
ProviderCommandReactorby replacing the singleDrainableWorkerwith a per-thread worker map keyed bythreadId, so provider-intent events are queued/serialized per thread but processed independently across threads.Updates
drainto wait on all active thread workers, and extends the test harness to allow customstartSessionbehavior plus a new regression test proving a hung session start inthread-1does not stopthread-2from reachingsendTurn.Reviewed by Cursor Bugbot for commit bde0e2a. Bugbot is set up for automated code reviews on this repo. Configure here.
Note
Isolate
ProviderCommandReactorthread workers to prevent head-of-line blockingDrainableWorkerwith a per-thread worker map in ProviderCommandReactor.ts, keyed byThreadId.workerForThreadeffect; thedrainmethod now drains all per-thread workers concurrently.Macroscope summarized bde0e2a.