Skip to content

feat: Fully Support MIS/TIS to stablizing rollout-training mismatch#930

Merged
rchardx merged 19 commits intoinclusionAI:mainfrom
ZiyiTsang:TIS_MIS
Mar 3, 2026
Merged

feat: Fully Support MIS/TIS to stablizing rollout-training mismatch#930
rchardx merged 19 commits intoinclusionAI:mainfrom
ZiyiTsang:TIS_MIS

Conversation

@ZiyiTsang
Copy link
Copy Markdown
Collaborator

@ZiyiTsang ZiyiTsang commented Feb 19, 2026

Description

In this PR, I contribute the TIS & MIS to stabilizing rollout-training mismatch, which is important when training with MoE+ Off policy.

In original code, only token-level MIS is applied. This P.R offer more choice

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not
    work as expected)
  • Documentation update
  • Code refactoring (no functional changes)
  • Performance improvement
  • Test coverage improvement

Checklist

  • I have read the Contributing Guide
  • I have run formatting tools (pre-commit or manual)
  • I have run relevant unit tests and they pass
  • I have added tests for new functionality
  • I have updated documentation if needed
  • My branch is up to date with main
  • This PR introduces breaking changes (if yes, fill out details below)
  • If this PR changes documentation, I have built and previewed it locally with
    jb build docs
  • No critical issues raised by AI reviewers (/gemini review)

Breaking Change Details (if applicable):

Additional Context


Need help? Check the Contributing Guide or ask in
GitHub Discussions!

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @ZiyiTsang, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a mechanism for Training-Inference Matching Importance Sampling (TIS/MIS) to address rollout-training mismatch, which is crucial for stabilizing training, especially in scenarios involving Mixture-of-Experts (MoE) and off-policy reinforcement learning. It provides configurable options for applying this correction, implements the core logic for calculating and applying the importance sampling ratios, and ensures proper integration into the PPO actor's loss computation. The changes are supported by new unit tests and updated documentation.

Highlights

  • TIS/MIS Configuration: New configuration parameters (engine_is_correction, engine_is_mode, engine_is_cap) were added to PPOActorConfig to enable and control Training-Inference Matching Importance Sampling (TIS/MIS) correction, along with validation logic.
  • TIS/MIS Implementation: A new function, compute_is_ratio_with_engine_is, was implemented to calculate the importance sampling ratio for train-inference mismatch, supporting token-level and sequence-level truncation or masking.
  • PPO Loss Integration: The TIS/MIS correction was integrated into the ppo_actor_loss_fn to adjust the policy gradient loss, and relevant statistics are now logged.
  • Unit Tests: Comprehensive unit tests were added to validate the TIS/MIS configuration, the correctness of the importance sampling ratio computation, and its effect on the PPO actor loss.
  • Documentation Update: The CLI reference documentation was updated to reflect the newly added TIS/MIS configuration parameters.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • areal/api/cli_args.py
    • Added engine_is_correction, engine_is_mode, and engine_is_cap fields to PPOActorConfig.
    • Implemented validation in __post_init__ for PPOActorConfig to ensure engine_is_correction is used with use_decoupled_loss=True or prox_logp_method='recompute'.
  • areal/tests/test_cli_args.py
    • Added a new test class TestPPOActorConfigEngineIS to verify the validation logic and default values of the new TIS/MIS configuration parameters.
  • areal/tests/test_functional.py
    • Imported compute_is_ratio_with_engine_is.
    • Added TestEngineISCorrection class with fixtures and tests for token-level and sequence-level TIS/MIS modes (truncate, mask) for both 1D and 2D data.
    • Added TestPPOActorLossFnWithEngineIS class to test the integration of TIS/MIS correction into ppo_actor_loss_fn and verify its impact on loss and statistics.
  • areal/trainer/ppo/actor.py
    • Modified ppo_update to pass engine_is_correction, engine_is_mode, and engine_is_cap from self.config to the grpo_loss_fn.
    • Updated grpo_loss_fn signature to accept the new TIS/MIS parameters.
    • Added logic to log engine_is_ratio statistics when TIS/MIS correction is enabled.
  • areal/utils/functional/init.py
    • Exported the new compute_is_ratio_with_engine_is function.
  • areal/utils/functional/functional.py
    • Implemented compute_is_ratio_with_engine_is to calculate importance sampling ratios with TIS/MIS correction based on specified mode and cap.
    • Modified ppo_actor_loss_fn to accept engine_is_correction, engine_is_mode, and engine_is_cap.
    • Integrated the computation and application of is_ratio from compute_is_ratio_with_engine_is into ppo_actor_loss_fn to modify the policy gradient loss.
    • Added is_ratio to the returned statistics dictionary in ppo_actor_loss_fn when correction is enabled.
  • docs/cli_reference.md
    • Added documentation for engine_is_correction, engine_is_mode, and engine_is_cap parameters under PPOActorConfig.
    • Updated the default value for the api_key parameter in the SwanLab configuration section.
Activity
  • The pull request is currently marked as 'Work in Process' (WIP).
  • The author, ZiyiTsang, has provided a description stating the contribution of TIS&MIS to stabilizing rollout-training mismatch, especially for MoE+ Off policy training.
  • The checklist indicates that several items are still pending, including marking as a new feature, documentation update, and test coverage improvement, consistent with a WIP status.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for Training-Inference Matching Importance Sampling (TIS/MIS) to address rollout-training mismatch, which is a valuable addition for off-policy training. The changes include new configurations in PPOActorConfig, validation logic, comprehensive unit tests for both the configuration and the functional implementation, and updates to the documentation. The implementation is well-structured and integrated into the existing PPO loss calculation.

However, I've identified a critical security issue: a hardcoded API key has been added to the documentation. This key must be removed to prevent any potential security breaches.

Comment thread docs/cli_reference.md Outdated
ZiyiTsang and others added 2 commits February 20, 2026 19:28
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@ZiyiTsang ZiyiTsang marked this pull request as ready for review February 20, 2026 11:28
@ZiyiTsang ZiyiTsang changed the title WIP: Support MIS/TIS to stablizing rollout-training mismatch feat: Support MIS/TIS to stablizing rollout-training mismatch Feb 20, 2026
@ZiyiTsang
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces MIS/TIS correction to stabilize training. The review identified critical issues such as hardcoded paths and leftover debugging code in examples that could break the build, along with bugs related to configuration defaults and inconsistent variable names preventing the new feature from working correctly. Specific suggestions have been provided to resolve these issues.

Comment thread areal/tests/test_cli_args.py Outdated
Comment thread examples/math/gsm8k_rl.py Outdated
Comment thread areal/api/cli_args.py Outdated
Comment thread areal/trainer/ppo/actor.py Outdated
Comment thread examples/math/gsm8k_grpo.yaml Outdated
Comment thread areal/api/cli_args.py Outdated
Comment thread areal/utils/functional/functional.py Outdated
Comment thread areal/utils/functional/functional.py Outdated
Comment thread areal/trainer/ppo/actor.py Outdated
Comment thread examples/math/gsm8k_rl.py Outdated
Comment thread areal/tests/test_cli_args.py Outdated
Comment thread examples/math/gsm8k_grpo.yaml Outdated
Comment thread areal/api/cli_args.py
Comment thread examples/math/gsm8k_grpo.yaml Outdated
Comment thread areal/api/cli_args.py Outdated
Comment thread areal/trainer/ppo/actor.py Outdated
Comment thread areal/api/cli_args.py Outdated
Comment thread areal/api/cli_args.py Outdated
Comment thread uv.lock Outdated
@ZiyiTsang
Copy link
Copy Markdown
Collaborator Author

/gemini review

@ZiyiTsang ZiyiTsang requested a review from rchardx March 1, 2026 11:07
@ZiyiTsang
Copy link
Copy Markdown
Collaborator Author

/gemini review

gemini-code-assist[bot]

This comment was marked as outdated.

@ZiyiTsang ZiyiTsang marked this pull request as draft March 1, 2026 11:27
@ZiyiTsang

This comment was marked as outdated.

@ZiyiTsang ZiyiTsang marked this pull request as ready for review March 1, 2026 13:28
@ZiyiTsang ZiyiTsang changed the title feat: Support MIS/TIS to stablizing rollout-training mismatch feat: Fully Support MIS/TIS to stablizing rollout-training mismatch Mar 2, 2026
Copy link
Copy Markdown
Collaborator

@rchardx rchardx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(submitting pending review)

Comment thread areal/api/cli_args.py Outdated
Comment thread areal/utils/functional/functional.py Outdated
Comment thread areal/api/cli_args.py Outdated
Comment thread areal/utils/functional/functional.py Outdated
Comment thread areal/utils/functional/functional.py Outdated
ZiyiTsang and others added 5 commits March 2, 2026 10:48
Unify naming conventions across the codebase:
- Rename behav_imp_weight_cap/mode to behave_imp_weight_cap/mode
- Change mode value from "disable" to "disabled" for consistency
- Add ValueError guard when calling compute_behave_imp_weight with disabled mode
- Add warning when behave_imp_weight settings are ignored (use_decoupled_loss=False)
- Update documentation strings for clarity
- Add comprehensive unit tests for compute_behave_imp_weight

Key changes:
- Rename all behav_imp_ prefixes to behave_imp_ in Python, YAML, and docs
- Change "disable" to "disabled" in choices, validation, and error messages
- Add validation in __post_init__ for behave_imp_weight_mode vs use_decoupled_loss
- Add TestComputeBehaveImpWeight test class with 5 test cases
- Update tests to use consistent spelling

Refs: PR feedback on naming consistency
Copy link
Copy Markdown
Collaborator

@rchardx rchardx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rchardx rchardx merged commit 03d7115 into inclusionAI:main Mar 3, 2026
5 checks passed
@ZiyiTsang ZiyiTsang deleted the TIS_MIS branch March 7, 2026 09:13
@ZiyiTsang ZiyiTsang restored the TIS_MIS branch March 7, 2026 09:13
@ZiyiTsang ZiyiTsang deleted the TIS_MIS branch March 7, 2026 09:13
leandermaben pushed a commit to leandermaben/AReaL that referenced this pull request Mar 24, 2026
…nclusionAI#930)

In this PR, I contribute the TIS & MIS to stabilizing rollout-training mismatch, which is important when training with MoE and off policy.
In original code, only token-level MIS is applied. This PR offers more choices.

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Wentai Zhang <zhangwentai.zwt@antgroup.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants