You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: add prompt cache token support to cost telemetry (#936)
* feat: add prompt cache token support to cost and token telemetry (#890)
Record cache token costs accurately for Anthropic and OpenAI models.
Previously, cache reads and writes were excluded from cost estimates and
Anthropic cache_creation_input_tokens were excluded from the input token
counter.
- TokenMetricsPlugin: adds cache_creation_input_tokens to prompt_tokens
for Anthropic (additive; not included in prompt_tokens by the API)
- CostMetricsPlugin: extracts cached_tokens from prompt_tokens_details
and prices cache reads and writes separately using the correct formula
(prompt_tokens - cached_tokens) * full_rate + cached_tokens * cache_read_rate
+ cache_creation_tokens * cache_write_rate
- builtin_pricing.json: adds cache_write_per_1m and cache_read_per_1m
for all current Anthropic and OpenAI models
- pricing.py: extends compute_cost() with cached_tokens and
cache_creation_tokens params
Assisted-by: Claude Code
Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>
* fix: correct LiteLLM cache token double-counting in metrics plugins
LiteLLM normalises Anthropic usage so that prompt_tokens already includes
cache_creation_input_tokens and cache_read_input_tokens. Both plugins were
treating prompt_tokens as raw base input and adding cache fields on top,
causing double-counting.
- TokenMetricsPlugin: drop the + cache_creation addition
- CostMetricsPlugin: subtract both cached_tokens and cache_creation from
prompt_tokens so write tokens are not billed at full rate and write rate
- Update test_cost_plugin_cache_tokens_forwarded to use a realistic
LiteLLM-normalised shape with the correct expected input_tokens value
- Remove the now-redundant with-cache-creation token metrics parametrize case
- Clarify pricing.py docs and validation warning around the replace-not-merge
behaviour of custom pricing file entries
Assisted-by: Claude Code
Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>
* fix: restore TokenMetricsPlugin and clarify custom pricing override scope in docs
Assisted-by: Claude Code
Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>
* docs: revert usage docstring to provider-agnostic wording
Assisted-by: Claude Code
Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>
---------
Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>
0 commit comments