Skip to content

refactor: restrict the usage scope of the rollout_batch method#567

Merged
nuzant merged 2 commits intomainfrom
fw/rm-rollout-batch
Nov 14, 2025
Merged

refactor: restrict the usage scope of the rollout_batch method#567
nuzant merged 2 commits intomainfrom
fw/rm-rollout-batch

Conversation

@garrett4wade
Copy link
Copy Markdown
Collaborator

Description

rollout_batch provides a synchronous rollout method that is convenient for debugging and writing tests. However, it is incompatible with dynamic filtering: since this method doesn't actively submit new rollouts while waiting, it will hang indefinitely if any requests are filtered out during the wait. Therefore, this method should only be used for debugging and testing, not in production experiments.

This PR removes the async_training configuration option from TrainEngine and enforces the use of prepare_batch in scripts. Documentation added in #558 explains how to achieve synchronous training behavior by setting the CLI config rollout.max_head_offpolicyness.

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not
    work as expected)
  • Documentation update
  • Code refactoring (no functional changes)
  • Performance improvement
  • Test coverage improvement

Checklist

  • I have read the Contributing Guide
  • I have run formatting tools (pre-commit or manual)
  • I have run relevant unit tests and they pass
  • I have added tests for new functionality
  • I have updated documentation if needed
  • My branch is up to date with main
  • This PR introduces breaking changes (if yes, fill out details below)
  • If this PR changes documentation, I have built and previewed it locally with
    jb build docs
  • No critical issues raised by AI reviewers (/gemini review)

Breaking Change Details (if applicable):

The rollout_batch method does not support should_accept_fn argument any more, but it should be classified as a bug fix instead of a breaking change.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @garrett4wade, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the rollout mechanism to restrict the usage of the synchronous rollout_batch method, which was prone to issues with dynamic filtering in production. The changes enforce the use of prepare_batch for all training scripts, ensuring compatibility with asynchronous operations and dynamic filtering. This also involved removing the now redundant async_training configuration option and updating all relevant documentation and examples to guide users towards the correct and robust approach for data collection.

Highlights

  • Deprecation of rollout_batch for Production: The rollout_batch method is now explicitly marked for debugging and testing only, as it is synchronous and incompatible with dynamic filtering, which can lead to indefinite hangs in production experiments.
  • Enforced prepare_batch Usage: The prepare_batch method is now the standard for asynchronous rollout and dynamic filtering, and its usage is enforced across various training scripts and documentation examples.
  • Removal of async_training Configuration: The async_training configuration option has been removed from TrainEngine and related YAML files, as the prepare_batch method inherently handles asynchronous behavior.
  • Updated Documentation and Examples: Extensive updates have been made to the documentation, including the README.md, debugging.md, cli_reference.md, and customization guides, to reflect the changes and provide clear guidance on using prepare_batch and configuring synchronous training via rollout.max_head_offpolicyness.
  • Refined AReaL Architecture Description: The README.md has been significantly refactored to clarify AReaL's 'algorithm-first' design philosophy, component overview, and the roles of various layers, including new backend engines and customization components.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a significant and well-executed refactoring of the data rollout mechanism. It restricts the usage of the synchronous rollout_batch method to debugging and testing, and enforces the use of the asynchronous prepare_batch method in all training scripts. This is achieved by removing the async_training configuration option and updating all example scripts and documentation accordingly. The should_accept_fn parameter has been correctly removed from rollout_batch to prevent it from hanging when used with dynamic filtering. The changes are consistent and thorough across the codebase. The accompanying documentation updates, especially in areal/README.md and docs/lite/gsm8k_grpo.md, are excellent and greatly improve the clarity of the library's architecture and usage. I have a couple of minor suggestions to further improve the clarity of the debugging documentation.

Comment thread docs/best_practices/debugging.md Outdated
Comment thread docs/best_practices/debugging.md
…f rollout_batch in training scripts; remove the async_training option
@garrett4wade
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively refactors the usage of rollout_batch and prepare_batch, restricting rollout_batch to debugging and testing scenarios and removing the async_training configuration. The changes are consistently applied across the codebase, including API definitions, training scripts, documentation, and tests. This simplification improves the API's clarity and enforces the recommended asynchronous training path.

I've included a couple of suggestions to further improve code clarity by removing a redundant argument in prepare_batch calls. These are minor but would make the example scripts cleaner.

Comment thread areal/experimental/trainer/rl.py
Comment thread examples/camel/train.py
@garrett4wade garrett4wade added the safe-to-test Ready to run unit-tests in a PR. label Nov 13, 2025
@garrett4wade garrett4wade changed the title [wip] refactor: restrict the usage scope of the rollout_batch method refactor: restrict the usage scope of the rollout_batch method Nov 13, 2025
@garrett4wade garrett4wade added safe-to-test Ready to run unit-tests in a PR. and removed safe-to-test Ready to run unit-tests in a PR. labels Nov 13, 2025
@nuzant nuzant merged commit 63b046c into main Nov 14, 2025
4 checks passed
@nuzant nuzant deleted the fw/rm-rollout-batch branch November 14, 2025 02:55
Bruce-rl-hw pushed a commit to Bruce-rl-hw/AReaL-vllm that referenced this pull request Dec 4, 2025
…lusionAI#567)

* remove should_accept_fn argument in rollout_batch; remove the usage of rollout_batch in training scripts; remove the async_training option

* fix test
leandermaben pushed a commit to leandermaben/AReaL that referenced this pull request Mar 24, 2026
…lusionAI#567)

* remove should_accept_fn argument in rollout_batch; remove the usage of rollout_batch in training scripts; remove the async_training option

* fix test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

safe-to-test Ready to run unit-tests in a PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants