Skip to content

fix(sqlite): fence remote requests by generation#5014

Draft
NathanFlurry wants to merge 1 commit into
counter-latency/core-metrics-endpointfrom
counter-latency/sqlite-generation-fence
Draft

fix(sqlite): fence remote requests by generation#5014
NathanFlurry wants to merge 1 commit into
counter-latency/core-metrics-endpointfrom
counter-latency/sqlite-generation-fence

Conversation

@NathanFlurry
Copy link
Copy Markdown
Member

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@railway-app
Copy link
Copy Markdown

railway-app Bot commented May 11, 2026

🚅 Deployed to the rivet-pr-5014 environment in rivet-frontend

Service Status Web Updated (UTC)
website 😴 Sleeping (View Logs) Web May 11, 2026 at 12:18 pm
frontend-cloud ❌ Build Failed (View Logs) Web May 11, 2026 at 12:07 pm
frontend-inspector ❌ Build Failed (View Logs) Web May 11, 2026 at 12:07 pm
ladle ❌ Build Failed (View Logs) Web May 11, 2026 at 12:07 pm
mcp-hub ✅ Success (View Logs) Web May 11, 2026 at 12:06 pm
kitchen-sink ❌ Build Failed (View Logs) Web May 11, 2026 at 12:06 pm

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 11, 2026

Code Review: fix(sqlite): fence remote requests by generation

Overview

This PR adds generation-based fencing to SQLite remote requests in two places:

  1. Client side (depot-client): A new GenerationFencedTransport wrapper automatically injects the actor generation into every get_pages/commit request made through the VFS.
  2. Server side (pegboard-envoy): When a request carries an expected_generation, it now routes to the richer validate_remote_sqlite_generation check instead of the plain actor existence check.

The fix also relaxes two guards that were previously blocking legitimate startup flows.


Issues

Fragile cross-package string matching (medium)

The new "actor does not exist" string in is_initial_main_page_missing (depot-client/src/database.rs) is produced by a bail! in pegboard-envoy, travels over the wire as a SqliteErrorResponse.message, and is compared with exact equality across packages. This is a cross-package string contract with no compile-time enforcement. If the error message changes on the server side, the client silently stops treating the error as a startup miss. A structured SqliteStorageError variant (e.g. ActorNotFound) would close this gap, consistent with how DatabaseNotFound is handled in is_startup_database_miss.

Dead parameter in is_startup_database_miss (low)

The _expected_generation: Option<u64> parameter was removed from the logic but kept in the signature. The call site still passes expected_generation. Since the check was intentionally dropped, both the parameter and the call-site argument should be removed entirely to avoid confusion.

validate_sqlite_actor_for_request skips get_for_kv (worth noting)

When expected_generation is Some, the new path skips validate_sqlite_actor (which calls get_for_kv to confirm the actor exists and belongs to the correct namespace) and goes directly to validate_remote_sqlite_generation. Namespace isolation is preserved since the generation check queries data scoped by conn.namespace_id. However, the two validation paths have different semantics if the envoy key state and the actor table diverge. Worth a brief comment explaining why get_for_kv is not needed when a generation is present.


Correctness

The core mechanics look correct:

  • get_or_insert is the right choice in GenerationFencedTransport. It preserves any explicit generation already on the request while filling in the generation for VFS-internal requests that set expected_generation: None.
  • Removing the expected_generation.is_none() guard from is_startup_database_miss is safe because generation fencing is now enforced by the transport wrapper before the request reaches the envoy, so a DatabaseNotFound from Depot during initial page load is still a legitimate startup-empty-db signal regardless of whether a generation was specified.
  • The "actor does not exist" to is_initial_main_page_missing path is needed: a freshly scheduled actor may not yet have its generation recorded in the DB when fetch_initial_pages_for_registration first runs, so this should yield an empty database rather than a hard error.

Suggestions

  1. Replace the string match with a structured SqliteStorageError variant so the contract between pegboard-envoy and depot-client is enforced at the type level.
  2. Remove the dead _expected_generation parameter from is_startup_database_miss and its call site.
  3. Add a brief comment to validate_sqlite_actor_for_request explaining that validate_remote_sqlite_generation subsumes the actor existence check when a generation is provided.
  4. Add a regression test for the generation mismatch path (startup with an actor not yet recorded in the DB should produce an empty database, not an error).

Summary

The approach is sound and namespace isolation holds. The two items to fix before merge are the fragile string match (latent correctness bug) and the dead _expected_generation parameter. Everything else is advisory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant