Skip to content

feat: support tracing option to honor parent sampling#8485

Draft
verdie-g wants to merge 2 commits intotriton-inference-server:mainfrom
verdie-g:honor-parent-sampling
Draft

feat: support tracing option to honor parent sampling#8485
verdie-g wants to merge 2 commits intotriton-inference-server:mainfrom
verdie-g:honor-parent-sampling

Conversation

@verdie-g
Copy link
Copy Markdown

What does the PR do?

OpenTelemetry tracing in Triton only supports one sampling option which is the rate. OpenTelemetry defines some standard samplers. Here I would like to support something close to the popular ParentBased. The sampling decision from the tracing header is used, otherwise a root sampler is used, in our case the rate option.

Checklist

  • I have read the Contribution guidelines and signed the Contributor License
    Agreement
  • PR title reflects the change and is of format <commit_type>: <Title>
  • Changes are described in the pull request.
  • Related issues are referenced.
  • Populated github labels field
  • Added test plan and verified test passes.
  • Verified that the PR passes existing CI.
  • I ran pre-commit locally (pre-commit install, pre-commit run --all)
  • Verified copyright is correct on all changed files.
  • Added succinct git squash message before merging ref.
  • All template sections are filled out.
  • Optional: Additional screenshots for behavior/output changes with before/after.

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

  • build
  • ci
  • docs
  • feat
  • fix
  • perf
  • refactor
  • revert
  • style
  • test

Related PRs:

Where should the reviewer start?

Test plan:

Caveats:

Background

I'm trying to have complete traces in my infrastructure but Triton completely ignores the sampling decision of the parent.

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

@verdie-g
Copy link
Copy Markdown
Author

@GuanLuo could you tell me what you think about this solution? Then I'll work on the testing part.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new tracing configuration option intended to honor the parent sampling decision (from propagated OpenTelemetry context) instead of always using only a rate-based sampler, addressing issue #8474.

Changes:

  • Extends TraceManager / TraceSetting to carry an honor_parent_sampling boolean and threads it through creation and settings APIs.
  • Updates trace sampling logic to optionally use the propagated parent span’s sampled flag (OTel mode) with fallback to rate-based sampling.
  • Adds CLI parsing for --trace-config honor-parent-sampling=<bool> and passes the value into tracing initialization.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/tracer.h Adds honor_parent_sampling to trace settings/config plumbing and updates method signatures.
src/tracer.cc Implements parent-based sampling behavior in SampleTrace and threads the new setting through updates/getters.
src/main.cc Passes the new CLI-configured flag into TraceManager::Create.
src/http_server.cc Updates trace settings retrieval signature to include the new flag.
src/grpc/grpc_server.cc Updates trace settings retrieval signature to include the new flag.
src/command_line_parser.h Adds trace_honor_parent_sampling_ to server parameters.
src/command_line_parser.cc Parses honor-parent-sampling from --trace-config into server parameters.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/tracer.cc
Comment on lines +1157 to +1172
#ifndef _WIN32
bool has_parent = false;
bool parent_is_sampled = false;
if (mode_ == TRACE_MODE_OPENTELEMETRY) {
auto active_span = otel_trace_api::GetSpan(start_options.propagated_context);
if (active_span->GetContext().IsValid()) {
has_parent = true;
parent_is_sampled = active_span->GetContext().IsSampled();
}
}
#endif

// Imitate OTEL's ParentBased sampler (i.e. use parent's sampling flag, fallback on a root sampler, in our case the rate option).
if (honor_parent_sampling_ && has_parent) {
// ParentBased sampler: respect parent's sampling decision
should_sample = parent_is_sampled;
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has_parent / parent_is_sampled are declared only under #ifndef _WIN32 but are referenced unconditionally later (honor_parent_sampling_ && has_parent). This will fail to compile on Windows builds. Define these variables outside the preprocessor guard with safe defaults, or guard the later has_parent usage with the same #ifndef _WIN32 block.

Copilot uses AI. Check for mistakes.
Comment thread src/tracer.cc
Comment on lines +1169 to +1173
// Imitate OTEL's ParentBased sampler (i.e. use parent's sampling flag, fallback on a root sampler, in our case the rate option).
if (honor_parent_sampling_ && has_parent) {
// ParentBased sampler: respect parent's sampling decision
should_sample = parent_is_sampled;
} else {
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When honor_parent_sampling_ && has_parent is true, sampling bypasses the mu_-protected path, so count_/Valid() limits are not enforced and sample_/created_ counters are not updated. This can lead to unlimited trace creation even when trace_count is 0 and makes sampling stats inconsistent. Consider applying the same Valid()/count_ gating and counter updates under mu_ even for the parent-based decision (e.g., decide should_sample from parent first, then under lock enforce count_ and update counters if sampling).

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Development

Successfully merging this pull request may close these issues.

3 participants