Widen resolve_max_new_tokens parameters to int64_t and rename for clarity (#18917)#18917
Widen resolve_max_new_tokens parameters to int64_t and rename for clarity (#18917)#18917kirklandsign wants to merge 1 commit intomainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18917
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New Failures, 2 Unrelated FailuresAs of commit a1b069a with merge base 2f339f0 ( NEW FAILURES - The following jobs have failed:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@kirklandsign has exported this pull request. If you are a Meta employee, you can view the originating Diff in D99769848. |
larryliu0820
left a comment
There was a problem hiding this comment.
Review automatically exported from Phabricator review in Meta.
This PR needs a
|
There was a problem hiding this comment.
Pull request overview
Updates GenerationConfig::resolve_max_new_tokens to better match how runners track context usage by renaming the second parameter to reflect “occupied tokens” semantics and widening its parameters to int64_t, with corresponding updates across C++/Python bindings, tests, and docs.
Changes:
- Rename
num_prompt_tokens→num_tokens_occupiedand update docstrings/comments to match actual semantics (occupied context positions). - Widen
resolve_max_new_tokensparameters (and internal arithmetic) toint64_t. - Update Python binding arg name,
.pyistub, unit tests, and C++ docs to reflect the new API.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| extension/llm/runner/irunner.h | Renames and widens resolve_max_new_tokens parameters; switches intermediate arithmetic to int64_t. |
| extension/llm/runner/pybindings.cpp | Updates pybind keyword argument name for resolve_max_new_tokens. |
| extension/llm/runner/_llm_runner.pyi | Updates type stub signature and docstring to the new parameter name/meaning. |
| extension/llm/runner/test/test_generation_config.cpp | Updates test comments to match the new parameter name/meaning. |
| extension/llm/runner/test/test_runner_pybindings.py | Adds coverage for calling resolve_max_new_tokens using the new keyword argument name. |
| docs/source/llm/run-with-c-plus-plus.md | Updates docs to the new signature and clarified semantics. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| result = std::min( | ||
| static_cast<int64_t>(max_new_tokens), | ||
| max_context_len - num_tokens_occupied); | ||
| } else if (seq_len != -1 && max_new_tokens == -1) { | ||
| // Only seq_len is specified | ||
| result = std::min(seq_len, max_context_len) - num_prompt_tokens; | ||
| result = std::min(static_cast<int64_t>(seq_len), max_context_len) - | ||
| num_tokens_occupied; | ||
| } else { | ||
| // Both are specified | ||
| result = std::min( | ||
| std::min(seq_len, max_context_len) - num_prompt_tokens, | ||
| max_new_tokens); | ||
| std::min(static_cast<int64_t>(seq_len), max_context_len) - | ||
| num_tokens_occupied, | ||
| static_cast<int64_t>(max_new_tokens)); |
| // Ensure result is not negative | ||
| return std::max(0, result); | ||
| return static_cast<int32_t>(std::max(static_cast<int64_t>(0), result)); | ||
| } |
| .def( | ||
| "resolve_max_new_tokens", | ||
| &GenerationConfig::resolve_max_new_tokens, | ||
| py::arg("max_context_len"), | ||
| py::arg("num_prompt_tokens"), | ||
| py::arg("num_tokens_occupied"), | ||
| "Resolve the maximum number of new tokens to generate based on constraints") |
…rity (#18917) Summary: The second parameter was named `num_prompt_tokens` (int32_t) but all callers (TextLLMRunner, MultimodalRunner) actually pass `pos_` (int64_t), which represents the total number of occupied positions in the context window — not just the current prompt's tokens. - Rename `num_prompt_tokens` → `num_tokens_occupied` to match actual semantics - Widen both parameters from int32_t to int64_t to eliminate implicit narrowing conversions from int64_t callers - Use int64_t internally to avoid truncation during intermediate arithmetic - Update pybinding arg name, .pyi type stub, tests, and docs Reviewed By: larryliu0820 Differential Revision: D99769848
7626fa9 to
d7b5e21
Compare
…rity (#18917) Summary: The second parameter was named `num_prompt_tokens` (int32_t) but all callers (TextLLMRunner, MultimodalRunner) actually pass `pos_` (int64_t), which represents the total number of occupied positions in the context window — not just the current prompt's tokens. - Rename `num_prompt_tokens` → `num_tokens_occupied` to match actual semantics - Widen both parameters from int32_t to int64_t to eliminate implicit narrowing conversions from int64_t callers - Use int64_t internally to avoid truncation during intermediate arithmetic - Update pybinding arg name, .pyi type stub, tests, and docs Reviewed By: larryliu0820 Differential Revision: D99769848
d7b5e21 to
7a57e53
Compare
…rity (#18917) Summary: Pull Request resolved: #18917 The second parameter was named `num_prompt_tokens` (int32_t) but all callers (TextLLMRunner, MultimodalRunner) actually pass `pos_` (int64_t), which represents the total number of occupied positions in the context window — not just the current prompt's tokens. - Rename `num_prompt_tokens` → `num_tokens_occupied` to match actual semantics - Widen both parameters from int32_t to int64_t to eliminate implicit narrowing conversions from int64_t callers - Use int64_t internally to avoid truncation during intermediate arithmetic - Update pybinding arg name, .pyi type stub, tests, and docs Reviewed By: larryliu0820 Differential Revision: D99769848
7a57e53 to
7f31bde
Compare
…rity (#18917) Summary: Pull Request resolved: #18917 The second parameter was named `num_prompt_tokens` (int32_t) but all callers (TextLLMRunner, MultimodalRunner) actually pass `pos_` (int64_t), which represents the total number of occupied positions in the context window — not just the current prompt's tokens. - Rename `num_prompt_tokens` → `num_tokens_occupied` to match actual semantics - Widen both parameters from int32_t to int64_t to eliminate implicit narrowing conversions from int64_t callers - Use int64_t internally to avoid truncation during intermediate arithmetic - Update pybinding arg name, .pyi type stub, tests, and docs Reviewed By: larryliu0820 Differential Revision: D99769848
7f31bde to
a1b069a
Compare
Summary:
The second parameter was named
num_prompt_tokens(int32_t) but allcallers (TextLLMRunner, MultimodalRunner) actually pass
pos_(int64_t), which represents the total number of occupied positions in
the context window — not just the current prompt's tokens.
num_prompt_tokens→num_tokens_occupiedto match actualsemantics
narrowing conversions from int64_t callers
arithmetic
Reviewed By: larryliu0820
Differential Revision: D99769848