refactor: restrict the usage scope of the rollout_batch method#567
refactor: restrict the usage scope of the rollout_batch method#567
rollout_batch method#567Conversation
Summary of ChangesHello @garrett4wade, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request refactors the rollout mechanism to restrict the usage of the synchronous Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request is a significant and well-executed refactoring of the data rollout mechanism. It restricts the usage of the synchronous rollout_batch method to debugging and testing, and enforces the use of the asynchronous prepare_batch method in all training scripts. This is achieved by removing the async_training configuration option and updating all example scripts and documentation accordingly. The should_accept_fn parameter has been correctly removed from rollout_batch to prevent it from hanging when used with dynamic filtering. The changes are consistent and thorough across the codebase. The accompanying documentation updates, especially in areal/README.md and docs/lite/gsm8k_grpo.md, are excellent and greatly improve the clarity of the library's architecture and usage. I have a couple of minor suggestions to further improve the clarity of the debugging documentation.
2d452ec to
cdd9b54
Compare
…f rollout_batch in training scripts; remove the async_training option
cdd9b54 to
789f549
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request effectively refactors the usage of rollout_batch and prepare_batch, restricting rollout_batch to debugging and testing scenarios and removing the async_training configuration. The changes are consistently applied across the codebase, including API definitions, training scripts, documentation, and tests. This simplification improves the API's clarity and enforces the recommended asynchronous training path.
I've included a couple of suggestions to further improve code clarity by removing a redundant argument in prepare_batch calls. These are minor but would make the example scripts cleaner.
rollout_batch methodrollout_batch method
…lusionAI#567) * remove should_accept_fn argument in rollout_batch; remove the usage of rollout_batch in training scripts; remove the async_training option * fix test
…lusionAI#567) * remove should_accept_fn argument in rollout_batch; remove the usage of rollout_batch in training scripts; remove the async_training option * fix test
Description
rollout_batchprovides a synchronous rollout method that is convenient for debugging and writing tests. However, it is incompatible with dynamic filtering: since this method doesn't actively submit new rollouts while waiting, it will hang indefinitely if any requests are filtered out during the wait. Therefore, this method should only be used for debugging and testing, not in production experiments.This PR removes the
async_trainingconfiguration option fromTrainEngineand enforces the use ofprepare_batchin scripts. Documentation added in #558 explains how to achieve synchronous training behavior by setting the CLI configrollout.max_head_offpolicyness.Type of Change
work as expected)
Checklist
jb build docs/gemini review)Breaking Change Details (if applicable):
The
rollout_batchmethod does not supportshould_accept_fnargument any more, but it should be classified as a bug fix instead of a breaking change.