Skip to content

Instrument jsrpc#6750

Open
danlapid wants to merge 1 commit into
mainfrom
dlapid/jsrpcTracing
Open

Instrument jsrpc#6750
danlapid wants to merge 1 commit into
mainfrom
dlapid/jsrpcTracing

Conversation

@danlapid
Copy link
Copy Markdown
Collaborator

Adds a jsRpcCall user trace span on every JS-RPC dispatch, on both sides of the wire, so that each method invocation is visible as its own span in tail traces. Spans are tagged with jsrpc.method, jsrpc.target_kind (fetcher / entrypoint / stub / transient / promise), and jsrpc.operation (call / getProperty).

Stub origin propagation: stubs returned via RPC now carry the jsRpcCall of the call that produced them. Follow-up calls on those stubs nest under the originating call rather than under the request root, restoring trace continuity across pipelined RPC chains. Plumbed through RpcDeserializerExternalHandler, JsRpcStub, and a new JsRpcClientProvider::ClientForOneCall { client, callSpanParents } return shape from getClientForOneCall.

OutgoingFactory result redesign: Fetcher::OutgoingFactory::Result now returns { client, spanParents } so Fetcher::buildClient can nest its inner operation span under the factory's outer dispatch span (e.g. durable_object_subrequest). This is what gives durable_object_subrequest correct attribution as a parent of the per-call spans inside a DO stub invocation. Cascades through actor.{h,c++}, actor-state.c++, container.c++, sockets.c++ (the latter two return spanParents = kj::none since they don't open an outer dispatch span). New TraceContextParent helper in trace.h carries a borrowed (internalSpan, userSpan) pair without owning SpanBuilders.

Server side: JsRpcSessionCustomEvent::run emits an internal-only jsRpcSession span for the legacy buffered tail. No user-visible jsRpcSession is emitted on the server because the jsrpc-typed onset already represents the session (delivered() to outcome).

Test fixtures updated: tail-worker-test gains jsRpcCall open/close events with the new tags on each callee, jsrpcDoSubrequest reflects the nested jsRpcSession -> jsRpcCall -> durable_object_subrequest -> jsRpcSession -> jsRpcCall shape, sql-test-tail and actor-kv-test-tail adjust span counts, and runInstrumentationTest filters jsRpcCall alongside jsRpcSession by default.

@danlapid danlapid requested review from a team as code owners May 11, 2026 00:37
Copy link
Copy Markdown
Contributor

@ask-bonk ask-bonk Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm Bonk, and I've done a quick review of your PR.

Adds per-call jsRpcCall trace spans on both client and server sides of JS-RPC, with origin propagation through returned stubs, and nests inner operation spans under the factory's outer dispatch span.

  1. [LOW] Misleading comment: SpanParent holds kj::Own<SpanObserver> (refcounted), so these are owned references, not borrowed. The "valid for the lifetime of client" constraint is inaccurate and could cause confusion for future maintainers.

This review was generated by an AI assistant and may contain inaccuracies.

Comment thread src/workerd/api/http.h Outdated
@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk Bot commented May 11, 2026

Review posted on PR #6750. Summary of findings:

  • Lifetime safety: TraceContextParent holds SpanParent objects which own refcounted kj::Own<SpanObserver> references. All captured/stored instances are independently valid. No dangling reference issues found.
  • Backward compatibility: The changes are purely additive tracing/observability. No user-facing API changes. No compat flag needed since trace output has never been contractually stable. The new spans are added to existing trace streams; existing spans are preserved.
  • Test coverage: Tests are thorough -- tail-worker-test.js validates exact expected trace sequences for all affected RPC patterns, sql-test-tail.js and actor-kv-test-tail.js update span counts, and the instrumentation helper properly filters the new spans by default.
  • One low-severity finding: A misleading "Borrowed" comment in OutgoingFactory::Result -- posted as a suggestion.

github run

@danlapid danlapid force-pushed the dlapid/jsrpcTracing branch 2 times, most recently from 58edf06 to 53debf7 Compare May 11, 2026 03:43
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 11, 2026

The generated output of @cloudflare/workers-types matches the snapshot in types/generated-snapshot 🎉

…tinuity

Adds a `jsRpcCall` user trace span on every JS-RPC dispatch, on both sides
of the wire, so that each method invocation is visible as its own span in
tail traces. Spans are tagged with `jsrpc.method`, `jsrpc.target_kind`
(fetcher / entrypoint / stub / transient / promise), and `jsrpc.operation`
(call / getProperty).

Stub origin propagation: stubs returned via RPC now carry the
`jsRpcCall` of the call that produced them. Follow-up calls on those
stubs nest under the originating call rather than under the request
root, restoring trace continuity across pipelined RPC chains. Plumbed
through `RpcDeserializerExternalHandler`, `JsRpcStub`, and a new
`JsRpcClientProvider::ClientForOneCall { client, callSpanParents }`
return shape from `getClientForOneCall`.

OutgoingFactory result redesign: `Fetcher::OutgoingFactory::Result` now
returns `{ client, spanParents }` so `Fetcher::buildClient` can nest its
inner operation span under the factory's outer dispatch span (e.g.
`durable_object_subrequest`). This is what gives `durable_object_subrequest`
correct attribution as a parent of the per-call spans inside a DO stub
invocation. Cascades through actor.{h,c++}, actor-state.c++,
container.c++, sockets.c++ (the latter two return `spanParents = kj::none`
since they don't open an outer dispatch span). New `TraceContextParent`
helper in `trace.h` carries a borrowed (internalSpan, userSpan) pair
without owning SpanBuilders.

Server side: `JsRpcSessionCustomEvent::run` emits an internal-only
`jsRpcSession` span for the legacy buffered tail. No user-visible
`jsRpcSession` is emitted on the server because the jsrpc-typed onset
already represents the session (delivered() to outcome).

Test fixtures updated: tail-worker-test gains `jsRpcCall` open/close
events with the new tags on each callee, jsrpcDoSubrequest reflects the
nested `jsRpcSession -> jsRpcCall -> durable_object_subrequest ->
jsRpcSession -> jsRpcCall` shape, sql-test-tail and actor-kv-test-tail
adjust span counts, and `runInstrumentationTest` filters `jsRpcCall`
alongside `jsRpcSession` by default.
@danlapid danlapid force-pushed the dlapid/jsrpcTracing branch from 53debf7 to 69596a1 Compare May 11, 2026 03:54

// Server-side jsRpcCall, attached to the dispatch promise below so it stays
// open through JS invocation and result serialization.
auto jsRpcCallSpan = ctx.makeUserTraceSpan("jsRpcCall"_kjc);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The callee's per call span should parent to the caller's per call client span, not to the session. This separates per call spans from the invocation onset, but matches convention and the user's mental model. Achievable by carrying span context on CallParams.

Copy link
Copy Markdown
Collaborator Author

@danlapid danlapid May 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sounds like a good idea but let me think about it more
It'll be the first time we move from having all spans be hierarchical children of the onset.
I don't know the ramifications..

TraceContextParent parents = kj::mv(callSpanParents).orDefault([&] {
return TraceContextParent(ioContext.getCurrentTraceSpan(), ioContext.getCurrentUserTraceSpan());
});
TraceContext span = parents.newChild("jsRpcCall"_kjc);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use method name as the span name not fixed jsRpcCall. Both sides.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't do this elsewhere, we use constant span names and let the OTEL ingestor visualize however it wants.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The span name should be rpc method per the convention. Better to break a incorrect pattern.

});
TraceContext span = parents.newChild("jsRpcCall"_kjc);

span.setTag("jsrpc.target_kind"_kjc, parent.getRpcTargetKind());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace with span kind.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. We're not yet emitting span kind, refactoring the repo to use it is quite out of scope.
  2. target kind is more than just client/server, we will want this regardless of span kind.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think customer's actually care about target kind. When would they think in those terms? They only care about side and the convention specifies it as span kind.

It's a small change comparable to trace flags.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants