fix: account for cached prompt tokens in OTEL spans#452
fix: account for cached prompt tokens in OTEL spans#452simonvdk-mistral wants to merge 5 commits intomainfrom
Conversation
Co-authored-by: mistral-hydra[bot] <mistral-hydra[bot]@users.noreply.github.com>
| if isinstance(prompt_token_details, dict): | ||
| cached_tokens = prompt_token_details.get("cached_tokens") | ||
| if isinstance(cached_tokens, int): | ||
| return cached_tokens | ||
|
|
||
| num_cached_tokens = usage.get("num_cached_tokens") | ||
| if isinstance(num_cached_tokens, int): | ||
| return num_cached_tokens |
There was a problem hiding this comment.
How did you arbitrate the priority between the two (prompt token details and number of cached tokens) ?
There was a problem hiding this comment.
I made the priority explicit in code in b6cce3d: prefer prompt_tokens_details.cached_tokens when present, and only fall back to top-level num_cached_tokens for payloads that expose the legacy field instead.
There was a problem hiding this comment.
Ok maybe the spec is not dry yet. The UsageInfo model I linked in the linear ticket is specific to a voice endpoint, and all other endpoints (chat completion, conversation, etc) do not have the cache tokens attributes defined yet in the models.
Let's wait a bit for this PR, will come back later
There was a problem hiding this comment.
Understood. I am not making a follow-up code change from this comment. The current branch only records gen_ai.usage.cache_read.input_tokens when the raw usage payload actually contains one of the cache-token fields, so endpoints whose generated models do not expose those fields today remain unaffected. I will leave the PR here and wait for your follow-up on whether you want to keep or revert this behavior once the models/spec settle.
Co-authored-by: mistral-hydra[bot] <mistral-hydra[bot]@users.noreply.github.com>
Co-authored-by: mistral-hydra[bot] <mistral-hydra[bot]@users.noreply.github.com>
Co-authored-by: mistral-hydra[bot] <mistral-hydra[bot]@users.noreply.github.com>
Co-authored-by: mistral-hydra[bot] <mistral-hydra[bot]@users.noreply.github.com>
Summary
gen_ai.usage.cache_read.input_tokenswhen cached prompt token counts are present in usage payloadsprompt_tokens_details.cached_tokens,prompt_token_details.cached_tokens, andnum_cached_tokens)Testing
uv run pytest src/mistralai/extra/tests/test_otel_tracing.py -k 'cached_prompt_tokens or num_cached_tokens or streaming_chat_completion_enriches_span or test_chat_completion_basic'uv run ruff check src/mistralai/extra/observability/otel.py src/mistralai/extra/tests/test_otel_tracing.pyLinear: OBS-1423