fix: move pause/continue_generation operations into update_weights#446
fix: move pause/continue_generation operations into update_weights#446garrett4wade merged 4 commits intomainfrom
Conversation
Summary of ChangesHello @rchardx, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request refines the control mechanisms within the inference engine by creating a clear distinction between pausing the actual generation process and suspending request submission. It introduces dedicated API methods for pausing and continuing generation, which are then strategically integrated into distributed weight update procedures to maintain system stability. Concurrently, the workflow API's terminology for request submission control has been updated for improved clarity and consistency. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Pull Request Overview
This PR refactors the pause/resume functionality to clearly distinguish between suspending request submission and pausing generation processes. It introduces new generation control methods and standardizes terminology across the codebase.
- Adds explicit
pause_generationandcontinue_generationmethods to inference engines for controlling generation during weight updates - Renames workflow API methods from
pause/resumetosuspend/resumeto clarify they control request submission, not generation - Integrates generation pausing into distributed weight update processes in FSDP and Megatron engines
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| areal/api/engine_api.py | Adds abstract pause_generation and continue_generation methods to base inference engine API |
| areal/api/workflow_api.py | Renames paused state to suspended and updates related method names for clarity |
| areal/engine/sglang_remote.py | Implements generation control methods and separates them from workflow suspension methods |
| areal/engine/vllm_remote.py | Implements generation control methods and separates them from workflow suspension methods |
| areal/engine/fsdp_engine.py | Integrates generation pausing into distributed weight update process |
| areal/experimental/megatron_engine.py | Integrates generation pausing into distributed weight update process |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Code Review
This pull request does a good job of refactoring the API to create a clear distinction between suspending request submission and pausing the generation process. The introduction of pause_generation and continue_generation and renaming pause to suspend in the workflow API improves clarity.
However, I've found a few critical issues that need to be addressed:
- In
megatron_engine.py,pause_generationis called twice, andcontinue_generationis never called, which will leave the generation process paused indefinitely after a weight update. - In
sglang_remote.pyandvllm_remote.py, there are calls toworkflow_executor.pause(), but this method has been renamed tosuspend()in this same PR, which will lead to a runtimeAttributeError. - There's also a minor documentation issue in
workflow_api.pywith a broken reference.
Please see the detailed comments for suggestions on how to fix these issues.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This pull request introduces a clear distinction between suspending request submission and pausing the actual generation process in inference engines. It adds new
pause_generationandcontinue_generationmethods to the inference engine APIs and ensures these are properly invoked during distributed weight updates. Additionally, it standardizes terminology across the workflow API, replacing "pause/resume" with "suspend/resume" for request submission, and updates related event and method names for clarity and consistency.API and Method Naming Improvements:
pause_generationandcontinue_generationmethods to the base inference engine API (InferenceEngine) to explicitly control the pausing and resuming of the generation process, particularly during weight updates.sglang_remote.py,vllm_remote.py) to implementpause_generationandcontinue_generationmethods, sending appropriate HTTP requests to pause/resume generation on remote servers. [1] [2] [3] [4]Workflow Suspension Refactor:
suspend/resumeandsuspended(waspaused), clarifying that these control request submission, not generation itself. Updated all related logic and documentation. [1] [2] [3]Weight Update Process Enhancements: