feat: Fully Support MIS/TIS to stablizing rollout-training mismatch by ZiyiTsang · Pull Request #930 · inclusionAI/AReaL

ZiyiTsang · 2026-02-19T15:10:38Z

Description

In this PR, I contribute the TIS & MIS to stabilizing rollout-training mismatch, which is important when training with MoE+ Off policy.

In original code, only token-level MIS is applied. This P.R offer more choice

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to not
work as expected)
Documentation update
Code refactoring (no functional changes)
Performance improvement
Test coverage improvement

Checklist

I have read the Contributing Guide
I have run formatting tools (pre-commit or manual)
I have run relevant unit tests and they pass
I have added tests for new functionality
I have updated documentation if needed
My branch is up to date with main
This PR introduces breaking changes (if yes, fill out details below)
If this PR changes documentation, I have built and previewed it locally with
jb build docs
No critical issues raised by AI reviewers (/gemini review)

Breaking Change Details (if applicable):

Additional Context

Need help? Check the Contributing Guide or ask in
GitHub Discussions!

gemini-code-assist · 2026-02-19T15:10:58Z

Summary of Changes

Hello @ZiyiTsang, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a mechanism for Training-Inference Matching Importance Sampling (TIS/MIS) to address rollout-training mismatch, which is crucial for stabilizing training, especially in scenarios involving Mixture-of-Experts (MoE) and off-policy reinforcement learning. It provides configurable options for applying this correction, implements the core logic for calculating and applying the importance sampling ratios, and ensures proper integration into the PPO actor's loss computation. The changes are supported by new unit tests and updated documentation.

Highlights

TIS/MIS Configuration: New configuration parameters (engine_is_correction, engine_is_mode, engine_is_cap) were added to PPOActorConfig to enable and control Training-Inference Matching Importance Sampling (TIS/MIS) correction, along with validation logic.
TIS/MIS Implementation: A new function, compute_is_ratio_with_engine_is, was implemented to calculate the importance sampling ratio for train-inference mismatch, supporting token-level and sequence-level truncation or masking.
PPO Loss Integration: The TIS/MIS correction was integrated into the ppo_actor_loss_fn to adjust the policy gradient loss, and relevant statistics are now logged.
Unit Tests: Comprehensive unit tests were added to validate the TIS/MIS configuration, the correctness of the importance sampling ratio computation, and its effect on the PPO actor loss.
Documentation Update: The CLI reference documentation was updated to reflect the newly added TIS/MIS configuration parameters.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

areal/api/cli_args.py
- Added engine_is_correction, engine_is_mode, and engine_is_cap fields to PPOActorConfig.
- Implemented validation in __post_init__ for PPOActorConfig to ensure engine_is_correction is used with use_decoupled_loss=True or prox_logp_method='recompute'.
areal/tests/test_cli_args.py
- Added a new test class TestPPOActorConfigEngineIS to verify the validation logic and default values of the new TIS/MIS configuration parameters.
areal/tests/test_functional.py
- Imported compute_is_ratio_with_engine_is.
- Added TestEngineISCorrection class with fixtures and tests for token-level and sequence-level TIS/MIS modes (truncate, mask) for both 1D and 2D data.
- Added TestPPOActorLossFnWithEngineIS class to test the integration of TIS/MIS correction into ppo_actor_loss_fn and verify its impact on loss and statistics.
areal/trainer/ppo/actor.py
- Modified ppo_update to pass engine_is_correction, engine_is_mode, and engine_is_cap from self.config to the grpo_loss_fn.
- Updated grpo_loss_fn signature to accept the new TIS/MIS parameters.
- Added logic to log engine_is_ratio statistics when TIS/MIS correction is enabled.
areal/utils/functional/init.py
- Exported the new compute_is_ratio_with_engine_is function.
areal/utils/functional/functional.py
- Implemented compute_is_ratio_with_engine_is to calculate importance sampling ratios with TIS/MIS correction based on specified mode and cap.
- Modified ppo_actor_loss_fn to accept engine_is_correction, engine_is_mode, and engine_is_cap.
- Integrated the computation and application of is_ratio from compute_is_ratio_with_engine_is into ppo_actor_loss_fn to modify the policy gradient loss.
- Added is_ratio to the returned statistics dictionary in ppo_actor_loss_fn when correction is enabled.
docs/cli_reference.md
- Added documentation for engine_is_correction, engine_is_mode, and engine_is_cap parameters under PPOActorConfig.
- Updated the default value for the api_key parameter in the SwanLab configuration section.

Activity

The pull request is currently marked as 'Work in Process' (WIP).
The author, ZiyiTsang, has provided a description stating the contribution of TIS&MIS to stabilizing rollout-training mismatch, especially for MoE+ Off policy training.
The checklist indicates that several items are still pending, including marking as a new feature, documentation update, and test coverage improvement, consistent with a WIP status.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces support for Training-Inference Matching Importance Sampling (TIS/MIS) to address rollout-training mismatch, which is a valuable addition for off-policy training. The changes include new configurations in PPOActorConfig, validation logic, comprehensive unit tests for both the configuration and the functional implementation, and updates to the documentation. The implementation is well-structured and integrated into the existing PPO loss calculation.

However, I've identified a critical security issue: a hardcoded API key has been added to the documentation. This key must be removed to prevent any potential security breaches.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

…_MIS

ZiyiTsang · 2026-02-20T11:29:43Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces MIS/TIS correction to stabilize training. The review identified critical issues such as hardcoded paths and leftover debugging code in examples that could break the build, along with bugs related to configuration defaults and inconsistent variable names preventing the new feature from working correctly. Specific suggestions have been provided to resolve these issues.

ZiyiTsang · 2026-03-01T10:45:43Z

/gemini review

ZiyiTsang · 2026-03-01T11:08:40Z

/gemini review

rchardx

(submitting pending review)

Unify naming conventions across the codebase: - Rename behav_imp_weight_cap/mode to behave_imp_weight_cap/mode - Change mode value from "disable" to "disabled" for consistency - Add ValueError guard when calling compute_behave_imp_weight with disabled mode - Add warning when behave_imp_weight settings are ignored (use_decoupled_loss=False) - Update documentation strings for clarity - Add comprehensive unit tests for compute_behave_imp_weight Key changes: - Rename all behav_imp_ prefixes to behave_imp_ in Python, YAML, and docs - Change "disable" to "disabled" in choices, validation, and error messages - Add validation in __post_init__ for behave_imp_weight_mode vs use_decoupled_loss - Add TestComputeBehaveImpWeight test class with 5 test cases - Update tests to use consistent spelling Refs: PR feedback on naming consistency

rchardx

LGTM

…nclusionAI#930) In this PR, I contribute the TIS & MIS to stabilizing rollout-training mismatch, which is important when training with MoE and off policy. In original code, only token-level MIS is applied. This PR offers more choices. --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Wentai Zhang <zhangwentai.zwt@antgroup.com>

.

d88b123

gemini-code-assist Bot reviewed Feb 19, 2026

View reviewed changes

Comment thread docs/cli_reference.md Outdated

ZiyiTsang added 3 commits February 20, 2026 09:47

.

a4af3a5

.

1dd37ba

.

e0b5210

ZiyiTsang force-pushed the TIS_MIS branch from 959fc3d to 1dd37ba Compare February 20, 2026 11:28

ZiyiTsang and others added 2 commits February 20, 2026 19:28

Update docs/cli_reference.md

4a201c6

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Merge branch 'TIS_MIS' of https://github.com/ZiyiTsang/AReaL into TIS…

9b445c8

…_MIS

ZiyiTsang marked this pull request as ready for review February 20, 2026 11:28

ZiyiTsang changed the title ~~WIP: Support MIS/TIS to stablizing rollout-training mismatch~~ feat: Support MIS/TIS to stablizing rollout-training mismatch Feb 20, 2026

gemini-code-assist Bot reviewed Feb 20, 2026

View reviewed changes