feat: Fully Support MIS/TIS to stablizing rollout-training mismatch#930
feat: Fully Support MIS/TIS to stablizing rollout-training mismatch#930rchardx merged 19 commits intoinclusionAI:mainfrom
Conversation
Summary of ChangesHello @ZiyiTsang, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a mechanism for Training-Inference Matching Importance Sampling (TIS/MIS) to address rollout-training mismatch, which is crucial for stabilizing training, especially in scenarios involving Mixture-of-Experts (MoE) and off-policy reinforcement learning. It provides configurable options for applying this correction, implements the core logic for calculating and applying the importance sampling ratios, and ensures proper integration into the PPO actor's loss computation. The changes are supported by new unit tests and updated documentation. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces support for Training-Inference Matching Importance Sampling (TIS/MIS) to address rollout-training mismatch, which is a valuable addition for off-policy training. The changes include new configurations in PPOActorConfig, validation logic, comprehensive unit tests for both the configuration and the functional implementation, and updates to the documentation. The implementation is well-structured and integrated into the existing PPO loss calculation.
However, I've identified a critical security issue: a hardcoded API key has been added to the documentation. This key must be removed to prevent any potential security breaches.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces MIS/TIS correction to stabilize training. The review identified critical issues such as hardcoded paths and leftover debugging code in examples that could break the build, along with bugs related to configuration defaults and inconsistent variable names preventing the new feature from working correctly. Specific suggestions have been provided to resolve these issues.
|
/gemini review |
|
/gemini review |
This comment was marked as outdated.
This comment was marked as outdated.
rchardx
left a comment
There was a problem hiding this comment.
(submitting pending review)
Unify naming conventions across the codebase: - Rename behav_imp_weight_cap/mode to behave_imp_weight_cap/mode - Change mode value from "disable" to "disabled" for consistency - Add ValueError guard when calling compute_behave_imp_weight with disabled mode - Add warning when behave_imp_weight settings are ignored (use_decoupled_loss=False) - Update documentation strings for clarity - Add comprehensive unit tests for compute_behave_imp_weight Key changes: - Rename all behav_imp_ prefixes to behave_imp_ in Python, YAML, and docs - Change "disable" to "disabled" in choices, validation, and error messages - Add validation in __post_init__ for behave_imp_weight_mode vs use_decoupled_loss - Add TestComputeBehaveImpWeight test class with 5 test cases - Update tests to use consistent spelling Refs: PR feedback on naming consistency
…nclusionAI#930) In this PR, I contribute the TIS & MIS to stabilizing rollout-training mismatch, which is important when training with MoE and off policy. In original code, only token-level MIS is applied. This PR offers more choices. --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Wentai Zhang <zhangwentai.zwt@antgroup.com>
Description
In this PR, I contribute the TIS & MIS to stabilizing rollout-training mismatch, which is important when training with MoE+ Off policy.
In original code, only token-level MIS is applied. This P.R offer more choice
Type of Change
work as expected)
Checklist
jb build docs/gemini review)Breaking Change Details (if applicable):
Additional Context
Need help? Check the Contributing Guide or ask in
GitHub Discussions!