feat: support tracing option to honor parent sampling#8485
feat: support tracing option to honor parent sampling#8485verdie-g wants to merge 2 commits intotriton-inference-server:mainfrom
Conversation
|
@GuanLuo could you tell me what you think about this solution? Then I'll work on the testing part. |
There was a problem hiding this comment.
Pull request overview
Adds a new tracing configuration option intended to honor the parent sampling decision (from propagated OpenTelemetry context) instead of always using only a rate-based sampler, addressing issue #8474.
Changes:
- Extends
TraceManager/TraceSettingto carry anhonor_parent_samplingboolean and threads it through creation and settings APIs. - Updates trace sampling logic to optionally use the propagated parent span’s sampled flag (OTel mode) with fallback to rate-based sampling.
- Adds CLI parsing for
--trace-config honor-parent-sampling=<bool>and passes the value into tracing initialization.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/tracer.h | Adds honor_parent_sampling to trace settings/config plumbing and updates method signatures. |
| src/tracer.cc | Implements parent-based sampling behavior in SampleTrace and threads the new setting through updates/getters. |
| src/main.cc | Passes the new CLI-configured flag into TraceManager::Create. |
| src/http_server.cc | Updates trace settings retrieval signature to include the new flag. |
| src/grpc/grpc_server.cc | Updates trace settings retrieval signature to include the new flag. |
| src/command_line_parser.h | Adds trace_honor_parent_sampling_ to server parameters. |
| src/command_line_parser.cc | Parses honor-parent-sampling from --trace-config into server parameters. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| #ifndef _WIN32 | ||
| bool has_parent = false; | ||
| bool parent_is_sampled = false; | ||
| if (mode_ == TRACE_MODE_OPENTELEMETRY) { | ||
| auto active_span = otel_trace_api::GetSpan(start_options.propagated_context); | ||
| if (active_span->GetContext().IsValid()) { | ||
| has_parent = true; | ||
| parent_is_sampled = active_span->GetContext().IsSampled(); | ||
| } | ||
| } | ||
| #endif | ||
|
|
||
| // Imitate OTEL's ParentBased sampler (i.e. use parent's sampling flag, fallback on a root sampler, in our case the rate option). | ||
| if (honor_parent_sampling_ && has_parent) { | ||
| // ParentBased sampler: respect parent's sampling decision | ||
| should_sample = parent_is_sampled; |
There was a problem hiding this comment.
has_parent / parent_is_sampled are declared only under #ifndef _WIN32 but are referenced unconditionally later (honor_parent_sampling_ && has_parent). This will fail to compile on Windows builds. Define these variables outside the preprocessor guard with safe defaults, or guard the later has_parent usage with the same #ifndef _WIN32 block.
| // Imitate OTEL's ParentBased sampler (i.e. use parent's sampling flag, fallback on a root sampler, in our case the rate option). | ||
| if (honor_parent_sampling_ && has_parent) { | ||
| // ParentBased sampler: respect parent's sampling decision | ||
| should_sample = parent_is_sampled; | ||
| } else { |
There was a problem hiding this comment.
When honor_parent_sampling_ && has_parent is true, sampling bypasses the mu_-protected path, so count_/Valid() limits are not enforced and sample_/created_ counters are not updated. This can lead to unlimited trace creation even when trace_count is 0 and makes sampling stats inconsistent. Consider applying the same Valid()/count_ gating and counter updates under mu_ even for the parent-based decision (e.g., decide should_sample from parent first, then under lock enforce count_ and update counters if sampling).
What does the PR do?
OpenTelemetry tracing in Triton only supports one sampling option which is the rate. OpenTelemetry defines some standard samplers. Here I would like to support something close to the popular ParentBased. The sampling decision from the tracing header is used, otherwise a root sampler is used, in our case the rate option.
Checklist
Agreement
<commit_type>: <Title>pre-commit install, pre-commit run --all)Commit Type:
Check the conventional commit type
box here and add the label to the github PR.
Related PRs:
Where should the reviewer start?
Test plan:
Caveats:
Background
I'm trying to have complete traces in my infrastructure but Triton completely ignores the sampling decision of the parent.
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)