Skip to content

Widen resolve_max_new_tokens parameters to int64_t and rename for clarity (#18917)#18917

Open
kirklandsign wants to merge 1 commit intomainfrom
export-D99769848
Open

Widen resolve_max_new_tokens parameters to int64_t and rename for clarity (#18917)#18917
kirklandsign wants to merge 1 commit intomainfrom
export-D99769848

Conversation

@kirklandsign
Copy link
Copy Markdown
Contributor

@kirklandsign kirklandsign commented Apr 15, 2026

Summary:

The second parameter was named num_prompt_tokens (int32_t) but all
callers (TextLLMRunner, MultimodalRunner) actually pass pos_
(int64_t), which represents the total number of occupied positions in
the context window — not just the current prompt's tokens.

  • Rename num_prompt_tokensnum_tokens_occupied to match actual
    semantics
  • Widen both parameters from int32_t to int64_t to eliminate implicit
    narrowing conversions from int64_t callers
  • Use int64_t internally to avoid truncation during intermediate
    arithmetic
  • Update pybinding arg name, .pyi type stub, tests, and docs

Reviewed By: larryliu0820

Differential Revision: D99769848

Copilot AI review requested due to automatic review settings April 15, 2026 23:19
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Apr 15, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18917

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 2 Unrelated Failures

As of commit a1b069a with merge base 2f339f0 (image):

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 15, 2026
@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync bot commented Apr 15, 2026

@kirklandsign has exported this pull request. If you are a Meta employee, you can view the originating Diff in D99769848.

Copy link
Copy Markdown
Contributor

@larryliu0820 larryliu0820 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review automatically exported from Phabricator review in Meta.

@github-actions
Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates GenerationConfig::resolve_max_new_tokens to better match how runners track context usage by renaming the second parameter to reflect “occupied tokens” semantics and widening its parameters to int64_t, with corresponding updates across C++/Python bindings, tests, and docs.

Changes:

  • Rename num_prompt_tokensnum_tokens_occupied and update docstrings/comments to match actual semantics (occupied context positions).
  • Widen resolve_max_new_tokens parameters (and internal arithmetic) to int64_t.
  • Update Python binding arg name, .pyi stub, unit tests, and C++ docs to reflect the new API.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
extension/llm/runner/irunner.h Renames and widens resolve_max_new_tokens parameters; switches intermediate arithmetic to int64_t.
extension/llm/runner/pybindings.cpp Updates pybind keyword argument name for resolve_max_new_tokens.
extension/llm/runner/_llm_runner.pyi Updates type stub signature and docstring to the new parameter name/meaning.
extension/llm/runner/test/test_generation_config.cpp Updates test comments to match the new parameter name/meaning.
extension/llm/runner/test/test_runner_pybindings.py Adds coverage for calling resolve_max_new_tokens using the new keyword argument name.
docs/source/llm/run-with-c-plus-plus.md Updates docs to the new signature and clarified semantics.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +86 to +98
result = std::min(
static_cast<int64_t>(max_new_tokens),
max_context_len - num_tokens_occupied);
} else if (seq_len != -1 && max_new_tokens == -1) {
// Only seq_len is specified
result = std::min(seq_len, max_context_len) - num_prompt_tokens;
result = std::min(static_cast<int64_t>(seq_len), max_context_len) -
num_tokens_occupied;
} else {
// Both are specified
result = std::min(
std::min(seq_len, max_context_len) - num_prompt_tokens,
max_new_tokens);
std::min(static_cast<int64_t>(seq_len), max_context_len) -
num_tokens_occupied,
static_cast<int64_t>(max_new_tokens));
Comment on lines 101 to 103
// Ensure result is not negative
return std::max(0, result);
return static_cast<int32_t>(std::max(static_cast<int64_t>(0), result));
}
Comment on lines 296 to 301
.def(
"resolve_max_new_tokens",
&GenerationConfig::resolve_max_new_tokens,
py::arg("max_context_len"),
py::arg("num_prompt_tokens"),
py::arg("num_tokens_occupied"),
"Resolve the maximum number of new tokens to generate based on constraints")
@meta-codesync meta-codesync bot changed the title Widen resolve_max_new_tokens parameters to int64_t and rename for clarity Widen resolve_max_new_tokens parameters to int64_t and rename for clarity (#18917) Apr 15, 2026
meta-codesync bot pushed a commit that referenced this pull request Apr 15, 2026
…rity (#18917)

Summary:

The second parameter was named `num_prompt_tokens` (int32_t) but all
callers (TextLLMRunner, MultimodalRunner) actually pass `pos_`
(int64_t), which represents the total number of occupied positions in
the context window — not just the current prompt's tokens.

- Rename `num_prompt_tokens` → `num_tokens_occupied` to match actual
  semantics
- Widen both parameters from int32_t to int64_t to eliminate implicit
  narrowing conversions from int64_t callers
- Use int64_t internally to avoid truncation during intermediate
  arithmetic
- Update pybinding arg name, .pyi type stub, tests, and docs

Reviewed By: larryliu0820

Differential Revision: D99769848
meta-codesync bot pushed a commit that referenced this pull request Apr 15, 2026
…rity (#18917)

Summary:

The second parameter was named `num_prompt_tokens` (int32_t) but all
callers (TextLLMRunner, MultimodalRunner) actually pass `pos_`
(int64_t), which represents the total number of occupied positions in
the context window — not just the current prompt's tokens.

- Rename `num_prompt_tokens` → `num_tokens_occupied` to match actual
  semantics
- Widen both parameters from int32_t to int64_t to eliminate implicit
  narrowing conversions from int64_t callers
- Use int64_t internally to avoid truncation during intermediate
  arithmetic
- Update pybinding arg name, .pyi type stub, tests, and docs

Reviewed By: larryliu0820

Differential Revision: D99769848
kirklandsign added a commit that referenced this pull request Apr 15, 2026
…rity (#18917)

Summary:
Pull Request resolved: #18917

The second parameter was named `num_prompt_tokens` (int32_t) but all
callers (TextLLMRunner, MultimodalRunner) actually pass `pos_`
(int64_t), which represents the total number of occupied positions in
the context window — not just the current prompt's tokens.

- Rename `num_prompt_tokens` → `num_tokens_occupied` to match actual
  semantics
- Widen both parameters from int32_t to int64_t to eliminate implicit
  narrowing conversions from int64_t callers
- Use int64_t internally to avoid truncation during intermediate
  arithmetic
- Update pybinding arg name, .pyi type stub, tests, and docs

Reviewed By: larryliu0820

Differential Revision: D99769848
…rity (#18917)

Summary:
Pull Request resolved: #18917

The second parameter was named `num_prompt_tokens` (int32_t) but all
callers (TextLLMRunner, MultimodalRunner) actually pass `pos_`
(int64_t), which represents the total number of occupied positions in
the context window — not just the current prompt's tokens.

- Rename `num_prompt_tokens` → `num_tokens_occupied` to match actual
  semantics
- Widen both parameters from int32_t to int64_t to eliminate implicit
  narrowing conversions from int64_t callers
- Use int64_t internally to avoid truncation during intermediate
  arithmetic
- Update pybinding arg name, .pyi type stub, tests, and docs

Reviewed By: larryliu0820

Differential Revision: D99769848
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants