Skip to content

Ray placement group refactor and preliminary architecture for multinode inference instances#966

Merged
garrett4wade merged 10 commits intoinclusionAI:mainfrom
hlyli:ray-multinode-vllm
Mar 4, 2026
Merged

Ray placement group refactor and preliminary architecture for multinode inference instances#966
garrett4wade merged 10 commits intoinclusionAI:mainfrom
hlyli:ray-multinode-vllm

Conversation

@hlyli
Copy link
Contributor

@hlyli hlyli commented Mar 2, 2026

Description

This PR is to address phase 1 of #963. We refactor the RayScheduler to have various distinct placement strategies. These will be a precursor to inference instances that span multiple nodes.

  1. Shared placements are similar to current training deployments where multiple training ranks share 1 placement group, each taking 1 bundle of the PG.
  2. Separate placements are for rollouts and will have 1 placement group per instance.
  3. Deferred placements are currently unused and will be for multinode instances.

We extract many of the placement group functions from areal/infra/scheduler/ray.py and move them into areal/infra/utils/ray_placement_group.py, where much of the logic of defining placement groups and bundles now resides.

Related Issue

#963

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not
    work as expected)
  • Documentation update
  • Code refactoring (no functional changes)
  • Performance improvement
  • Test coverage improvement

Checklist

  • I have read the Contributing Guide
  • I have run formatting tools (pre-commit or manual)
  • I have run relevant unit tests and they pass
  • I have added tests for new functionality (I only updated existing tests)
  • I have updated documentation if needed
  • My branch is up to date with main
  • This PR introduces breaking changes (if yes, fill out details below)
  • If this PR changes documentation, I have built and previewed it locally with
    jb build docs
  • No critical issues raised by AI reviewers (/gemini review)

Breaking Change Details (if applicable):

Additional Context


Need help? Check the Contributing Guide or ask in
GitHub Discussions!

@hlyli
Copy link
Contributor Author

hlyli commented Mar 2, 2026

/gemini review

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the handling of Ray placement groups within the scheduling system, addressing phase 1 of issue #963. It introduces a flexible, strategy-based architecture that allows for different approaches—shared, separate, and deferred—to manage how resources are allocated to workers. This foundational change is crucial for enabling more advanced deployment patterns, particularly for multinode inference instances, by centralizing and abstracting the complex logic of resource bundling and placement.

Highlights

  • Ray Placement Strategy Refactoring: The core logic for managing Ray placement groups has been refactored into a new module, areal/infra/utils/ray_placement_group.py, introducing an abstract base class RayPlacementStrategy and concrete implementations for 'shared', 'separate', and 'deferred' strategies.
  • New ray_placement_strategy Field: A new ray_placement_strategy field has been added to SchedulingSpec, allowing users to specify how Ray resources should be allocated for different roles (e.g., training, rollout).
  • Rollout Controller Adaptation: The RolloutController has been updated to automatically enforce the 'separate' placement strategy for rollout jobs, ensuring each rollout instance receives its own dedicated placement group.
  • Enhanced Actor Cleanup: The RayScheduler's actor cleanup mechanism has been improved to include a more robust termination attempt using __ray_terminate__.remote() before resorting to ray.kill().
  • Support for Multinode Inference Architecture: The introduction of 'deferred' placement strategy and the overall refactoring lay the groundwork for future multinode inference instances by providing a flexible and extensible architecture for resource management.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • areal/api/cli_args.py
    • Added ray_placement_strategy field to SchedulingSpec with choices 'shared', 'separate', and 'deferred'.
  • areal/infra/controller/rollout_controller.py
    • Modified initialize function to force ray_placement_strategy to 'separate' for rollout jobs if 'shared' is initially set.
  • areal/infra/scheduler/ray.py
    • Removed several internal helper methods related to bundle and actor resource specification, and placement group creation.
    • Imported new placement strategy classes (DeferredDeviceRayPlacementStrategy, RayPlacementStrategy, SeparatedRayPlacementStrategy, SharedRayPlacementStrategy) and ray_resource_type from areal.infra.utils.ray_placement_group.
    • Introduced _get_placement_strategy method to dynamically select the appropriate RayPlacementStrategy.
    • Updated _create_ray_workers to utilize the new RayPlacementStrategy for creating placement groups and actor resources.
    • Added a check in _create_forked_workers_internal to prevent colocation of multi-GPU workers.
    • Improved actor termination logic in _cleanup_workers by attempting __ray_terminate__.remote() before ray.kill().
  • areal/infra/utils/ray.py
    • Added create_resource_spec utility function to standardize resource dictionary creation.
  • areal/infra/utils/ray_placement_group.py
    • Added a new file defining an abstract base class RayPlacementStrategy and its concrete implementations: SharedRayPlacementStrategy, SeparatedRayPlacementStrategy, and DeferredDeviceRayPlacementStrategy.
    • Included helper functions ray_resource_type, _create_bundle_specs_split, _bundle_spec, _actor_resource_spec, and _create_placement_group.
  • tests/test_ray_scheduler.py
    • Updated test_create_delete_workers to explicitly set ray_placement_strategy="shared" for SchedulingSpec.
    • Removed assertions that verified separate placement groups for shared strategy, aligning with the new shared PG behavior.
    • Modified test_fork_creates_workers_on_same_placement_group to assert that forked workers share the same bundle index as their parent.
Activity
  • No specific activity (comments, reviews, progress updates) was provided in the context.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a well-structured refactoring of Ray placement group handling by using different placement strategies. This is a great improvement for modularity and future extensions like multi-node inference.

I've found a few issues that need attention:

  • There are a couple of critical bugs: one in resource specification that would prevent actors from being scheduled with the correct memory, and a syntax error in type hints.
  • There are also a couple of medium-severity suggestions to improve maintainability and restore lost configurability for timeouts.

Overall, the direction is excellent, and with these fixes, the implementation will be solid.

hlyli and others added 2 commits March 2, 2026 15:52
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant and well-executed refactoring of the Ray placement group logic. By abstracting placement strategies into separate classes, the code becomes more modular, readable, and extensible, which is an excellent foundation for supporting multi-node inference. The logic has been cleanly moved from RayScheduler to the new areal/infra/utils/ray_placement_group.py file. My review includes a few suggestions to address a bug in resource specification, a hardcoded timeout, and a minor type hint correction.

Comment on lines +590 to +596
try:
actor.__ray_terminate__.remote()
except Exception:
logger.warning(
f"Could not destroy remote actor {actor}, force killing actor"
)
ray.kill(actor, no_restart=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

While actor.destroy.remote() is the standard way to clean up actors, it can sometimes hang. The addition of a fallback to actor.__ray_terminate__.remote() before resorting to ray.kill() is a good improvement for robustness, as __ray_terminate__ provides a more graceful termination.

Suggested change
try:
actor.__ray_terminate__.remote()
except Exception:
logger.warning(
f"Could not destroy remote actor {actor}, force killing actor"
)
ray.kill(actor, no_restart=True)
except Exception:
try:
# Attempt a more graceful termination before force killing.
actor.__ray_terminate__.remote()
except Exception:
logger.warning(
f"Could not destroy remote actor {actor}, force killing actor"
)
ray.kill(actor, no_restart=True)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same code?

hlyli and others added 3 commits March 2, 2026 15:54
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@hlyli
Copy link
Contributor Author

hlyli commented Mar 3, 2026

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant and well-designed refactoring of the Ray scheduler's placement group handling. By introducing a strategy pattern for placement groups (Shared, Separate, Deferred), the code becomes more modular and extensible, paving the way for future features like multi-node inference. The logic has been cleanly extracted into a new ray_placement_group.py utility file. My review focuses on a couple of minor improvements to enhance user feedback and code clarity. Specifically, I've suggested adding a warning when a user's configuration is silently changed and replacing a magic number with a named constant for better readability.

@hlyli hlyli marked this pull request as ready for review March 3, 2026 00:14
Copy link
Collaborator

@garrett4wade garrett4wade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Additional Issue: TestUtilityFunctions Will Break (CRITICAL, Confidence 95)

tests/test_ray_scheduler.py:175 calls scheduler._create_bundle_list_gpu(1, 24, 1024), but this method was removed from RayScheduler in this PR (along with _bundle_spec, _actor_resource_spec, and _sum_resource_spec). This will raise AttributeError at runtime.

While the test file is skipped by default via pytestmark, it's still broken code.

Fix: Update TestUtilityFunctions to call the equivalent from the new module:

from areal.infra.utils.ray_placement_group import _create_bundle_specs_split
bundle_list = _create_bundle_specs_split(16, 1, 24, 1024)

return self._placement_groups

def actor_resources(
self, spec: SchedulingSpec, gpu_multiplier=1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 CRITICAL: GPU Multiplier Removed — Will Break Forked Workers (Confidence 90)

The old _actor_resource_spec() always applied a 0.9× GPU multiplier to leave headroom for forked workers (ref/proxy actors colocated via fork_workers()). This new SeparatedRayPlacementStrategy requests full 1.0 GPU (gpu_multiplier=1), consuming the entire bundle budget.

When fork_workers() later creates ref/proxy workers on the same placement group (each requesting 0.01 GPU via _create_forked_workers_internal), the total GPU demand becomes 1.0 + 0.01 = 1.01, which exceeds the 1.0 GPU in the bundle. Ray will fail to schedule the forked actor.

Additionally, the raise RuntimeError on line 203 blocks any caller from passing a multiplier at all.

Old behavior (all workers got 0.9×):

if device == "GPU":
    res["num_gpus"] = float(gpu) * 0.9  # Leave room for forked workers

Suggested fix: Either apply MAIN_WORKER_GPU_FRAC_FOR_COLOCATION here too, or remove the raise and let callers choose:

def actor_resources(self, spec, gpu_multiplier=MAIN_WORKER_GPU_FRAC_FOR_COLOCATION):
    # Remove the RuntimeError guard
    options = _actor_resource_spec(spec.cpu, spec.gpu * gpu_multiplier, spec.mem)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this one, I will leave it as gpu_multiplier=1 and let the caller decide. I loosened the RuntimeError such that it will only raise if gpu_multiplier != 1 and spec.gpu > 1 as Ray will throw an error regardless if it tries to schedule a worker with a fractional GPU > 1.

Copy link
Collaborator

@garrett4wade garrett4wade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@garrett4wade garrett4wade merged commit b79b3ac into inclusionAI:main Mar 4, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants