Widen resolve_max_new_tokens parameters to int64_t and rename for clarity (#18917) by kirklandsign · Pull Request #18917 · pytorch/executorch

kirklandsign · 2026-04-15T23:19:13Z

Summary:

The second parameter was named num_prompt_tokens (int32_t) but all
callers (TextLLMRunner, MultimodalRunner) actually pass pos_
(int64_t), which represents the total number of occupied positions in
the context window — not just the current prompt's tokens.

Rename num_prompt_tokens → num_tokens_occupied to match actual
semantics
Widen both parameters from int32_t to int64_t to eliminate implicit
narrowing conversions from int64_t callers
Use int64_t internally to avoid truncation during intermediate
arithmetic
Update pybinding arg name, .pyi type stub, tests, and docs

Reviewed By: larryliu0820

Differential Revision: D99769848

pytorch-bot · 2026-04-15T23:19:18Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18917

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 2 Unrelated Failures

As of commit a1b069a with merge base 2f339f0 ():

NEW FAILURES - The following jobs have failed:

pull / test-multimodal-linux (gemma3-4b) / linux-job (gh)
RuntimeError: Command docker exec -t a063a442b160dc1d5cd895fe677b8566ea306c82851c921addd9a189540be75f /exec failed with exit code 139
pull / unittest-editable / macos / macos-job (gh)
backends/xnnpack/test/ops/test_conv2d.py::TestConv2d::test_fp16_conv2d

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / unittest / windows / windows-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / unittest-editable / windows / windows-job (gh) (trunk failure)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-codesync · 2026-04-15T23:19:23Z

@kirklandsign has exported this pull request. If you are a Meta employee, you can view the originating Diff in D99769848.

larryliu0820

Review automatically exported from Phabricator review in Meta.

github-actions · 2026-04-15T23:20:05Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Copilot

Pull request overview

Updates GenerationConfig::resolve_max_new_tokens to better match how runners track context usage by renaming the second parameter to reflect “occupied tokens” semantics and widening its parameters to int64_t, with corresponding updates across C++/Python bindings, tests, and docs.

Changes:

Rename num_prompt_tokens → num_tokens_occupied and update docstrings/comments to match actual semantics (occupied context positions).
Widen resolve_max_new_tokens parameters (and internal arithmetic) to int64_t.
Update Python binding arg name, .pyi stub, unit tests, and C++ docs to reflect the new API.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
extension/llm/runner/irunner.h	Renames and widens `resolve_max_new_tokens` parameters; switches intermediate arithmetic to `int64_t`.
extension/llm/runner/pybindings.cpp	Updates pybind keyword argument name for `resolve_max_new_tokens`.
extension/llm/runner/_llm_runner.pyi	Updates type stub signature and docstring to the new parameter name/meaning.
extension/llm/runner/test/test_generation_config.cpp	Updates test comments to match the new parameter name/meaning.
extension/llm/runner/test/test_runner_pybindings.py	Adds coverage for calling `resolve_max_new_tokens` using the new keyword argument name.
docs/source/llm/run-with-c-plus-plus.md	Updates docs to the new signature and clarified semantics.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+      result = std::min(
+          static_cast<int64_t>(max_new_tokens),
+          max_context_len - num_tokens_occupied);
    } else if (seq_len != -1 && max_new_tokens == -1) {
      // Only seq_len is specified
-      result = std::min(seq_len, max_context_len) - num_prompt_tokens;
+      result = std::min(static_cast<int64_t>(seq_len), max_context_len) -
+          num_tokens_occupied;
    } else {
      // Both are specified
      result = std::min(
-          std::min(seq_len, max_context_len) - num_prompt_tokens,
-          max_new_tokens);
+          std::min(static_cast<int64_t>(seq_len), max_context_len) -
+              num_tokens_occupied,
+          static_cast<int64_t>(max_new_tokens));


    // Ensure result is not negative
-    return std::max(0, result);
+    return static_cast<int32_t>(std::max(static_cast<int64_t>(0), result));
  }


      .def(
          "resolve_max_new_tokens",
          &GenerationConfig::resolve_max_new_tokens,
          py::arg("max_context_len"),
-          py::arg("num_prompt_tokens"),
+          py::arg("num_tokens_occupied"),
          "Resolve the maximum number of new tokens to generate based on constraints")


…rity (#18917) Summary: The second parameter was named `num_prompt_tokens` (int32_t) but all callers (TextLLMRunner, MultimodalRunner) actually pass `pos_` (int64_t), which represents the total number of occupied positions in the context window — not just the current prompt's tokens. - Rename `num_prompt_tokens` → `num_tokens_occupied` to match actual semantics - Widen both parameters from int32_t to int64_t to eliminate implicit narrowing conversions from int64_t callers - Use int64_t internally to avoid truncation during intermediate arithmetic - Update pybinding arg name, .pyi type stub, tests, and docs Reviewed By: larryliu0820 Differential Revision: D99769848

…rity (#18917) Summary: Pull Request resolved: #18917 The second parameter was named `num_prompt_tokens` (int32_t) but all callers (TextLLMRunner, MultimodalRunner) actually pass `pos_` (int64_t), which represents the total number of occupied positions in the context window — not just the current prompt's tokens. - Rename `num_prompt_tokens` → `num_tokens_occupied` to match actual semantics - Widen both parameters from int32_t to int64_t to eliminate implicit narrowing conversions from int64_t callers - Use int64_t internally to avoid truncation during intermediate arithmetic - Update pybinding arg name, .pyi type stub, tests, and docs Reviewed By: larryliu0820 Differential Revision: D99769848

kirklandsign requested review from larryliu0820 and mergennachin as code owners April 15, 2026 23:19

Copilot AI review requested due to automatic review settings April 15, 2026 23:19

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 15, 2026

meta-codesync bot added fb-exported meta-exported labels Apr 15, 2026

larryliu0820 approved these changes Apr 15, 2026

View reviewed changes

Copilot started reviewing on behalf of kirklandsign April 15, 2026 23:19 View session

Copilot AI reviewed Apr 15, 2026

View reviewed changes

meta-codesync bot changed the title ~~Widen resolve_max_new_tokens parameters to int64_t and rename for clarity~~ Widen resolve_max_new_tokens parameters to int64_t and rename for clarity (#18917) Apr 15, 2026

meta-codesync bot force-pushed the export-D99769848 branch from 7626fa9 to d7b5e21 Compare April 15, 2026 23:22

meta-codesync bot force-pushed the export-D99769848 branch from d7b5e21 to 7a57e53 Compare April 15, 2026 23:24

kirklandsign force-pushed the export-D99769848 branch from 7a57e53 to 7f31bde Compare April 15, 2026 23:26

kirklandsign force-pushed the export-D99769848 branch from 7f31bde to a1b069a Compare April 15, 2026 23:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Widen resolve_max_new_tokens parameters to int64_t and rename for clarity (#18917)#18917

Widen resolve_max_new_tokens parameters to int64_t and rename for clarity (#18917)#18917
kirklandsign wants to merge 1 commit intomainfrom
export-D99769848

kirklandsign commented Apr 15, 2026 •

edited by meta-codesync bot

Loading

Uh oh!

pytorch-bot bot commented Apr 15, 2026 •

edited

Loading

Uh oh!

meta-codesync bot commented Apr 15, 2026

Uh oh!

larryliu0820 left a comment

Uh oh!

github-actions bot commented Apr 15, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kirklandsign commented Apr 15, 2026 • edited by meta-codesync bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18917

❌ 2 New Failures, 2 Unrelated Failures

Uh oh!

meta-codesync bot commented Apr 15, 2026

Uh oh!

larryliu0820 left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 15, 2026

This PR needs a release notes: label

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kirklandsign commented Apr 15, 2026 •

edited by meta-codesync bot

Loading

pytorch-bot bot commented Apr 15, 2026 •

edited

Loading

This PR needs a `release notes:` label