refactor: add export_stats as engine's method#643
Conversation
Summary of ChangesHello @garrett4wade, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly refactors how statistics are exported by integrating the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request effectively refactors the statistics exporting mechanism by encapsulating the logic within the engine classes. Moving export_stats into TrainEngine and InferenceEngine is a solid design choice that improves modularity and correctly assigns responsibility for handling parallelism-specific aggregations. The changes are consistently applied throughout the codebase, including the removal of the now-redundant /export_stats RPC endpoint, which simplifies the API.
However, I've noted one significant issue: the pull request, marked as a refactoring with no functional changes, also appears to disable LoRA support for the vLLM backend. This is a functional change that should be clarified and properly documented in the PR description.
Apart from this point, the refactoring is well-executed and improves the overall code structure.
a57767f to
bc6828d
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request is a well-executed refactoring that moves the export_stats functionality into an engine method. This is a solid design improvement, as it centralizes the statistics gathering logic within the engine, which has the necessary context about parallelism strategies. The changes are applied consistently across the API, engine implementations, RPC server, tests, and example files.
I have a couple of suggestions: one is to fix a minor typo in a docstring, and the other is to add unit tests for the new GPU detection logic to ensure its robustness and improve maintainability. Overall, this is a great change that improves the codebase's structure.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
FYI CI unit test is passing now. |
* add export stats as engine method and update examples * Update areal/api/engine_api.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Description
This refactor is required by #611 , and has the following benefits:
/export_statsendpoint in the RPC server. A single/callendpoint is sufficient.MegatronEngine.Type of Change
work as expected)
Checklist
jb build docs/gemini review)