Skip to content

fix: account for cached prompt tokens in OTEL spans#452

Draft
simonvdk-mistral wants to merge 5 commits intomainfrom
hydra/OBS-1423/session-a63fceb33e32
Draft

fix: account for cached prompt tokens in OTEL spans#452
simonvdk-mistral wants to merge 5 commits intomainfrom
hydra/OBS-1423/session-a63fceb33e32

Conversation

@simonvdk-mistral
Copy link
Copy Markdown
Contributor

Summary

  • emit gen_ai.usage.cache_read.input_tokens when cached prompt token counts are present in usage payloads
  • support the current payload shapes exposed by the SDK and API (prompt_tokens_details.cached_tokens, prompt_token_details.cached_tokens, and num_cached_tokens)
  • add non-streaming and streaming OTEL regression tests for cached token usage

Testing

  • uv run pytest src/mistralai/extra/tests/test_otel_tracing.py -k 'cached_prompt_tokens or num_cached_tokens or streaming_chat_completion_enriches_span or test_chat_completion_basic'
  • uv run ruff check src/mistralai/extra/observability/otel.py src/mistralai/extra/tests/test_otel_tracing.py

Linear: OBS-1423

Co-authored-by: mistral-hydra[bot] <mistral-hydra[bot]@users.noreply.github.com>
Comment on lines +275 to +282
if isinstance(prompt_token_details, dict):
cached_tokens = prompt_token_details.get("cached_tokens")
if isinstance(cached_tokens, int):
return cached_tokens

num_cached_tokens = usage.get("num_cached_tokens")
if isinstance(num_cached_tokens, int):
return num_cached_tokens
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you arbitrate the priority between the two (prompt token details and number of cached tokens) ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the priority explicit in code in b6cce3d: prefer prompt_tokens_details.cached_tokens when present, and only fall back to top-level num_cached_tokens for payloads that expose the legacy field instead.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok maybe the spec is not dry yet. The UsageInfo model I linked in the linear ticket is specific to a voice endpoint, and all other endpoints (chat completion, conversation, etc) do not have the cache tokens attributes defined yet in the models.

Let's wait a bit for this PR, will come back later

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. I am not making a follow-up code change from this comment. The current branch only records gen_ai.usage.cache_read.input_tokens when the raw usage payload actually contains one of the cache-token fields, so endpoints whose generated models do not expose those fields today remain unaffected. I will leave the PR here and wait for your follow-up on whether you want to keep or revert this behavior once the models/spec settle.

mistral-hydra bot added 4 commits March 27, 2026 16:50
Co-authored-by: mistral-hydra[bot] <mistral-hydra[bot]@users.noreply.github.com>
Co-authored-by: mistral-hydra[bot] <mistral-hydra[bot]@users.noreply.github.com>
Co-authored-by: mistral-hydra[bot] <mistral-hydra[bot]@users.noreply.github.com>
Co-authored-by: mistral-hydra[bot] <mistral-hydra[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant