Single-controller LoRA RL fine-tuning with vLLM#735
Single-controller LoRA RL fine-tuning with vLLM#735garrett4wade merged 4 commits intoinclusionAI:mainfrom
Conversation
Summary of ChangesHello @gursimar, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request delivers a practical and verified example for fine-tuning models using LoRA with a single controller and the vLLM inference engine. It serves as a blueprint for users looking to implement GRPO workflows with these specific technologies, providing both the Python script and the necessary YAML configuration to get started. The primary goal is to expand the existing LoRA + vLLM capabilities with a concrete, runnable demonstration. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a new example for single-controller LoRA fine-tuning with the vLLM backend. The changes include a Python script for the training workflow and a corresponding YAML configuration file. The code is well-structured for an example script. My review includes a couple of suggestions for the Python script to improve maintainability by removing a magic number and to add a placeholder for an evaluation step, which seems intended by the configuration but is currently missing.
12d9725 to
fd4fd16
Compare
garrett4wade
left a comment
There was a problem hiding this comment.
While the implementation looks great, I'd like to still confirm the details about learning performance.
The previous SPMD LoRA code has an unresolved bug that if multiple infernece engines submit rollout requests concurrently, the learning performance will significantly drop. As a workaround, we only submit requests on rank 0 (code). Only through this way the learning curve can basically match full-parameter tuning.
I wonder whether the bug still exists in the single controller mode. Can you provide learning curves comparing this new script with the default SPMD, full-parameter tuning script? Hopefully there is no performance drop any more.
fd4fd16 to
90b7da1
Compare
|
This pull request has been automatically marked as stale because it has not had recent activity within the last 14 days. Please add a comment or push new commits to keep it active. Thank you for your contribution! |
90b7da1 to
3309483
Compare
|
Due to the updated refactoring like PPOTrainer and the new examples folder, we have made the following adjustments
python -u examples/math/gsm8k_rl.py --config examples/math/gsm8k_grpo_lora.yamlThis aligns with the new design of single runner that runs with different yaml files.
|
… required anymore


Description
This PR adds working, tested examples for running single-controller LoRA training with the vLLM backend.
It builds on the existing LoRA + vLLM support (RFC #609) and demonstrates how to configure and launch a single-controller GRPO workflow.
What’s included
Files changed
Kept files in the examples/lora folder on purpose as IMO, all lora exmaples should be under this forder only.
examples/lora/gsm8k_grpo_vllm_single_controller.py— single-controller GRPO LoRA exampleexamples/lora/gsm8k_grpo_vllm_single_controller.yaml— config for vLLM backendRunning instructions
Testing
Type of Change
Checklist
/gemini review)Need help? Check the Contributing Guide or ask in
GitHub Discussions!