Skip to content

chore(rivetkit): rewrite work registry + fix waituntil not preventing sleep#4990

Merged
NathanFlurry merged 1 commit into
mainfrom
05-06-chore_rivetkit_rewrite_work_registry_fix_waituntil_not_preventing_sleep
May 9, 2026
Merged

chore(rivetkit): rewrite work registry + fix waituntil not preventing sleep#4990
NathanFlurry merged 1 commit into
mainfrom
05-06-chore_rivetkit_rewrite_work_registry_fix_waituntil_not_preventing_sleep

Conversation

@NathanFlurry
Copy link
Copy Markdown
Member

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@railway-app
Copy link
Copy Markdown

railway-app Bot commented May 7, 2026

🚅 Deployed to the rivet-pr-4990 environment in rivet-frontend

Service Status Web Updated (UTC)
kitchen-sink ❌ Build Failed (View Logs) Web May 9, 2026 at 7:33 am
frontend-cloud ❌ Build Failed (View Logs) Web May 8, 2026 at 11:36 pm
website 😴 Sleeping (View Logs) Web May 7, 2026 at 8:17 am
frontend-inspector ❌ Build Failed (View Logs) Web May 7, 2026 at 4:18 am
ladle ❌ Build Failed (View Logs) Web May 7, 2026 at 12:24 am
mcp-hub ✅ Success (View Logs) Web May 7, 2026 at 12:24 am

Copy link
Copy Markdown
Member Author

NathanFlurry commented May 7, 2026

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 7, 2026

Code Review: Work Registry Rewrite + waitUntil Sleep Fix

Overview

This PR does two things: (1) replaces multiple ad-hoc guard types (KeepAwakeGuard, WebSocketCallbackGuard, DisconnectCallbackGuard) with a unified ActorWorkKind + ActorWorkPolicy + ActorWorkGuard system, and (2) fixes a production crash where c.vars became undefined after the sleep grace deadline because JS-side keep-awake tracking could race with NAPI runtime-state teardown.

The core fix is moving the "wait for in-flight work before cleanup" gate from a JS-side keepAwakeCount counter into a Rust-side wait_for_tracked_shutdown_work() call, consistent with the layering rule that all lifecycle logic must live in rivetkit-core.


Issues

Dead code after JoinSet::shutdown()

sleep.rs — teardown loop:

abortable_shutdown_tasks.0.shutdown().await;
while let Some(result) = abortable_shutdown_tasks.0.join_next().await { // dead: shutdown() already drained
    ...
}
while let Some(result) = abortable_shutdown_tasks.1.join_next().await {
    ...
}

JoinSet::shutdown() aborts all tasks and awaits completion, leaving the set empty. The first join_next loop immediately returns None and is dead code. Remove it.

Redundant websocket_callback check in wait_for_tracked_shutdown_work_drained

Every WebSocketCallbackRegion goes through begin_work_region(ActorWorkKind::WebSocketCallback)ActorWorkGuard::new(), which increments both the websocket_callback idle counter and shutdown_counter (because drains_shutdown_grace: true). The two are always decremented together. So shutdown_counter == 0 already implies websocket_callback == 0. The explicit websocket wait arm and count check are redundant. Consider removing or documenting why both are checked.

Behavior change in teardown: abortable tasks always aborted

Old code:

if abort_remaining {
    shutdown_tasks.shutdown().await;  // abort
} else {
    while let Some(result) = shutdown_tasks.join_next().await { ... }  // drain normally
}

New code unconditionally calls shutdown() on the abortable JoinSet. In practice this should be correct — by the time teardown runs, wait_for_tracked_shutdown_work() has already given the grace window — but it's a subtle semantic shift that should be confirmed intentional. A comment explaining why the abort_remaining branch was removed would help future readers.

Known wasm test failure

The driver-test-progress notes acknowledge an intentionally unresolved wasm failure (waitUntil can broadcast before sleep disconnect). If this is a known gap that will land as a follow-up, consider a .agent/todo/ entry or a tracking comment, per repo convention.


Style Notes

Several doc comments added to WorkRegistry fields describe what the field does rather than why it exists or what non-obvious invariant it carries (e.g., /// Counts user keep-awake regions that block idle sleep. on field keep_awake). Per CLAUDE.md, well-named identifiers should be self-documenting; comments should explain the non-obvious WHY. The existing activity_notify comment is a good model since it explains the externally-owned counter constraint that isn't derivable from the name.


What's Good

  • Correct layering: removing keepAwakeCount / deferSleepCleanupUntilKeepAwakeIdle from JS and replacing with a Rust-side drain call is the right move per the core-only lifecycle rule.
  • Policy table: ActorWorkKind::policy() is a clean single source of truth. All match arms are fully enumerated without _ fallthrough, consistent with CLAUDE.md.
  • wait_zero_unbounded correctness: arms the Notify before loading the counter value, so no decrement-to-zero can race past the waiter. Correct.
  • Test coverage: the three new Rust unit tests (shutdown_counter_zero, websocket_callback_zero, keep_awake_deadline_cancels) and the two new driver tests (waitUntil in onSleep keeps c.vars, c.vars access in ws handler after grace deadline) directly exercise the fixed race conditions.
  • #[must_use] on ActorWorkRegion: correct — dropping it immediately would silently cancel work.

@abcxff abcxff mentioned this pull request May 7, 2026
11 tasks
@NathanFlurry NathanFlurry changed the base branch from 05-05-fix_rivetkit_use_keepawake_for_websocket_callback_tracking_to_prevent_c.vars_crash_after_grace_deadline to graphite-base/4990 May 7, 2026 04:14
@NathanFlurry NathanFlurry force-pushed the graphite-base/4990 branch from 0499832 to 395aa83 Compare May 7, 2026 04:14
@NathanFlurry NathanFlurry force-pushed the 05-06-chore_rivetkit_rewrite_work_registry_fix_waituntil_not_preventing_sleep branch from 11f2601 to 61f01e4 Compare May 7, 2026 04:14
@railway-app railway-app Bot temporarily deployed to rivet-frontend / rivet-pr-4990 May 7, 2026 04:14 Destroyed
@NathanFlurry NathanFlurry changed the base branch from graphite-base/4990 to main May 7, 2026 04:14
@NathanFlurry NathanFlurry force-pushed the 05-06-chore_rivetkit_rewrite_work_registry_fix_waituntil_not_preventing_sleep branch from 61f01e4 to c8f49d4 Compare May 8, 2026 23:34
@railway-app railway-app Bot temporarily deployed to rivet-frontend / rivet-pr-4990 May 8, 2026 23:34 Destroyed
@NathanFlurry NathanFlurry force-pushed the 05-06-chore_rivetkit_rewrite_work_registry_fix_waituntil_not_preventing_sleep branch from c8f49d4 to 6a62381 Compare May 9, 2026 02:41
@railway-app railway-app Bot temporarily deployed to rivet-frontend / rivet-pr-4990 May 9, 2026 02:41 Destroyed
@NathanFlurry NathanFlurry marked this pull request as ready for review May 9, 2026 05:57
@NathanFlurry NathanFlurry force-pushed the 05-06-chore_rivetkit_rewrite_work_registry_fix_waituntil_not_preventing_sleep branch from 6a62381 to 6b5aa7b Compare May 9, 2026 07:33
@railway-app railway-app Bot temporarily deployed to rivet-frontend / rivet-pr-4990 May 9, 2026 07:33 Destroyed
@NathanFlurry NathanFlurry merged commit 6b5aa7b into main May 9, 2026
11 of 18 checks passed
@NathanFlurry NathanFlurry deleted the 05-06-chore_rivetkit_rewrite_work_registry_fix_waituntil_not_preventing_sleep branch May 9, 2026 07:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant