Skip to content

fix: move pause/continue_generation operations into update_weights#446

Merged
garrett4wade merged 4 commits intomainfrom
rchardx/fix_engine
Oct 13, 2025
Merged

fix: move pause/continue_generation operations into update_weights#446
garrett4wade merged 4 commits intomainfrom
rchardx/fix_engine

Conversation

@rchardx
Copy link
Collaborator

@rchardx rchardx commented Oct 13, 2025

This pull request introduces a clear distinction between suspending request submission and pausing the actual generation process in inference engines. It adds new pause_generation and continue_generation methods to the inference engine APIs and ensures these are properly invoked during distributed weight updates. Additionally, it standardizes terminology across the workflow API, replacing "pause/resume" with "suspend/resume" for request submission, and updates related event and method names for clarity and consistency.

API and Method Naming Improvements:

  • Added pause_generation and continue_generation methods to the base inference engine API (InferenceEngine) to explicitly control the pausing and resuming of the generation process, particularly during weight updates.
  • Updated remote engine implementations (sglang_remote.py, vllm_remote.py) to implement pause_generation and continue_generation methods, sending appropriate HTTP requests to pause/resume generation on remote servers. [1] [2] [3] [4]

Workflow Suspension Refactor:

  • Renamed the workflow API's pause/resume methods and state to suspend/resume and suspended (was paused), clarifying that these control request submission, not generation itself. Updated all related logic and documentation. [1] [2] [3]

Weight Update Process Enhancements:

  • Modified distributed weight update logic in both FSDP and Megatron engines to pause generation before weight transfer and continue generation after, ensuring consistency and correctness during updates. [1] [2] [3] [4]

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @rchardx, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refines the control mechanisms within the inference engine by creating a clear distinction between pausing the actual generation process and suspending request submission. It introduces dedicated API methods for pausing and continuing generation, which are then strategically integrated into distributed weight update procedures to maintain system stability. Concurrently, the workflow API's terminology for request submission control has been updated for improved clarity and consistency.

Highlights

  • Generation Control API: Introduced new pause_generation and continue_generation methods to the base InferenceEngine API and implemented them in remote engines (sglang_remote.py, vllm_remote.py) to explicitly manage the generation process.
  • Workflow API Renaming: Renamed pause/resume methods and the paused state in the workflow API to suspend/resume and suspended respectively, to clearly differentiate between suspending request submission and pausing generation.
  • Distributed Weight Update Integration: Modified the distributed weight update logic in FSDP and Megatron engines to invoke pause_generation before weight transfers and continue_generation afterward, ensuring generation stability during updates.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the pause/resume functionality to clearly distinguish between suspending request submission and pausing generation processes. It introduces new generation control methods and standardizes terminology across the codebase.

  • Adds explicit pause_generation and continue_generation methods to inference engines for controlling generation during weight updates
  • Renames workflow API methods from pause/resume to suspend/resume to clarify they control request submission, not generation
  • Integrates generation pausing into distributed weight update processes in FSDP and Megatron engines

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
areal/api/engine_api.py Adds abstract pause_generation and continue_generation methods to base inference engine API
areal/api/workflow_api.py Renames paused state to suspended and updates related method names for clarity
areal/engine/sglang_remote.py Implements generation control methods and separates them from workflow suspension methods
areal/engine/vllm_remote.py Implements generation control methods and separates them from workflow suspension methods
areal/engine/fsdp_engine.py Integrates generation pausing into distributed weight update process
areal/experimental/megatron_engine.py Integrates generation pausing into distributed weight update process

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request does a good job of refactoring the API to create a clear distinction between suspending request submission and pausing the generation process. The introduction of pause_generation and continue_generation and renaming pause to suspend in the workflow API improves clarity.

However, I've found a few critical issues that need to be addressed:

  • In megatron_engine.py, pause_generation is called twice, and continue_generation is never called, which will leave the generation process paused indefinitely after a weight update.
  • In sglang_remote.py and vllm_remote.py, there are calls to workflow_executor.pause(), but this method has been renamed to suspend() in this same PR, which will lead to a runtime AttributeError.
  • There's also a minor documentation issue in workflow_api.py with a broken reference.

Please see the detailed comments for suggestions on how to fix these issues.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Collaborator

@garrett4wade garrett4wade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@garrett4wade garrett4wade merged commit 6a06baf into main Oct 13, 2025
1 of 4 checks passed
@garrett4wade garrett4wade deleted the rchardx/fix_engine branch October 13, 2025 09:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants