torchcomms

torchcomms is a new experimental communications API for PyTorch. This provides both the high level collectives API as well as several out of the box backends.

Prerequisites

torchcomms requires the following software and hardware:

Python 3.10 or higher
PyTorch 2.8 or higher
CUDA-capable GPU (for NCCL/NCCLX or RCCL backends)
Intel XPU (for XCCL backend)

Installation

torchcomms is available on PyPI and can be installed using pip. Alternatively, you can build torchcomms from source.

Using pip (Stable)

You can install torchcomms and PyTorch (2.11+) using pip:

# Cuda 12.6
pip install torch torchcomms --index-url https://download.pytorch.org/whl/cu126

# Cuda 12.8
pip install torch torchcomms --index-url https://download.pytorch.org/whl/cu128

# Cuda 13.0
pip install torch torchcomms --index-url https://download.pytorch.org/whl/cu130

Using pip (Nightly Builds)

You can install torchcomms and PyTorch nightly builds using pip:

# Cuda 12.6
pip install --pre torch torchcomms --index-url https://download.pytorch.org/whl/nightly/cu126

# Cuda 12.8
pip install --pre torch torchcomms --index-url https://download.pytorch.org/whl/nightly/cu128

# Cuda 13.0
pip install --pre torch torchcomms --index-url https://download.pytorch.org/whl/nightly/cu130

Building from Source

Prerequisites

CMake 3.22 or higher
Ninja 1.10 or higher

Alternatively, you can build torchcomms from source. If you want to build the NCCLX backend, we recommend building it under a virtual conda environment. Run the following commands to build and install torchcomms:

# Create a conda environment
conda create -n torchcomms python=3.10
conda activate torchcomms
# Clone the repository
git clone git@github.com:meta-pytorch/torchcomms.git
cd torchcomms

Build the backend (choose one based on your hardware):

Standard NCCL Backend

No build needed - uses the library provided by PyTorch

NCCLX Backend

If you want to install the third-party dependencies directly from conda, run the following command:

USE_SYSTEM_LIBS=1 ./build_ncclx.sh

If you want to build and install the third-party dependencies from source, run the following command:

./build_ncclx.sh

RCCL Backend

Install some prerequisites

conda install conda-forge::glog=0.4.0 conda-forge::gflags conda-forge::fmt -y

Environment variables to find rocm/rccl headers

export ROCM_HOME=/opt/rocm
export RCCL_INCLUDE=$ROCM_HOME/include/rccl

./build_rccl.sh

RCCLX Backend

Install some prerequisites

conda install conda-forge::glog=0.4.0 conda-forge::gflags conda-forge::fmt -y

Environment variables to find rocm/rcclx headers

export BUILD_DIR=${PWD}/comms/rcclx/develop/build/release/build
export ROCM_HOME=/opt/rocm
export RCCLX_INCLUDE=${BUILD_DIR}/include/rccl
export RCCLX_LIB=${BUILD_DIR}/lib

./build_rcclx.sh

TIP: Default builds both gfx942 and gfx950 and can take 1hr+. Narrow to your GPU:

MI300X/MI325X (gfx942):

./build_rcclx.sh --amdgpu_targets gfx942

MI350X/MI355X (gfx950):

./build_rcclx.sh --amdgpu_targets gfx950

Detect your arch if unsure:

rocminfo | grep -m1 gfx

XCCL Backend

Source Intel oneAPI environment (update path to your oneAPI installation)

export INTEL_ONEAPI=/path/to/intel/oneapi  # e.g., /opt/intel/oneapi or ~/intel/oneapi
source $INTEL_ONEAPI/compiler/latest/env/vars.sh
source $INTEL_ONEAPI/ccl/latest/env/vars.sh

Enable XCCL backend and install

export USE_XCCL=ON
export USE_NCCL=OFF
export USE_NCCLX=OFF
export USE_TRANSPORT=OFF
pip install --no-build-isolation -v .

Install torchcomms:

Set backend env vars before installing. For RCCLX-only:

export USE_NCCL=OFF
export USE_NCCLX=OFF
export USE_GLOO=OFF
export USE_RCCL=OFF
export USE_RCCLX=ON

(See Build Configuration below for defaults and other mixes.)

# Install PyTorch (if not already installed)
pip install -r requirements.txt
pip install --no-build-isolation -v .

# Note: For installing torchcomms with RCCL or RCCLX backend, make sure to turn off the other backends:
USE_NCCL=OFF USE_NCCLX=OFF USE_GLOO=OFF USE_RCCL=OFF USE_RCCLX=ON USE_TRANSPORT=OFF pip install --no-build-isolation -v .

Build Configuration

You can customize the build by setting environment variables before running pip install:

# Enable/disable specific backends (ON/OFF or 1/0)
export USE_NCCL=ON    # Default: ON
export USE_NCCLX=ON   # Default: ON
export USE_GLOO=ON    # Default: ON
export USE_RCCL=OFF   # Default: OFF
export USE_RCCLX=OFF  # Default: OFF
export USE_XCCL=OFF   # Default: OFF

Then run:

# Install PyTorch (if not already installed)
pip install -r requirements.txt
pip install --no-build-isolation -v .

Quick Start Example

Here's a simple example demonstrating synchronous AllReduce communication across multiple GPUs:

#!/usr/bin/env python3
# example.py
import torch
from torchcomms import new_comm, ReduceOp

def main():
    # Initialize TorchComm with device-specific backend
    device = torch.device("<device>") # cuda, xpu, etc
    torchcomm = new_comm("<backend>", device, name="main_comm") # nccl, ncclx, rccl, xccl, etc

    # Get rank and world size
    rank = torchcomm.get_rank()
    world_size = torchcomm.get_size()

    # Calculate device ID
    num_devices = torch.<device>.device_count()
    device_id = rank % num_devices
    target_device = torch.device(f"<device>:{device_id}")

    print(f"Rank {rank}/{world_size}: Running on device {device_id}")

    # Create a tensor with rank-specific data
    tensor = torch.full(
        (1024,),
        float(rank + 1),
        dtype=torch.float32,
        device=target_device
    )

    print(f"Rank {rank}: Before AllReduce: {tensor[0].item()}")

    # Perform synchronous AllReduce (sum across all ranks)
    torchcomm.all_reduce(tensor, ReduceOp.SUM, async_op=False)

    # Synchronize device stream
    torch.<device>.current_stream().synchronize()

    print(f"Rank {rank}: After AllReduce: {tensor[0].item()}")

    # Cleanup
    torchcomm.finalize()

if __name__ == "__main__":
    main()

Running the Example

To run this example with multiple processes (one per GPU):

# Using torchrun (recommended)
torchrun --nproc_per_node=2 example.py

# Or using python -m torch.distributed.launch
python -m torch.distributed.launch --nproc_per_node=2 example.py

To run this example with multiple nodes:

Node 0

torchrun --nnodes=2 --nproc_per_node=8 --node_rank=0 --rdzv-endpoint="<master-node>:<master-port>" example.py

Node 1

torchrun --nnodes=2 --nproc_per_node=8 --node_rank=1 --rdzv-endpoint="<master-node>:<master-port>" example.py

In the example above, we perform the following steps:

new_comm() creates a communicator with the specified backend
Each process gets its unique rank and total world size
Each rank creates a tensor with rank-specific values
All tensors are summed across all ranks
Clean up communication resources

Asynchronous Operations

torchcomms also supports asynchronous operations for better performance. Here is the same example as above, but with asynchronous AllReduce:

import torch
from torchcomms import new_comm, ReduceOp

# Use the correct device and backend for TorchComms initialization
device = torch.device("<device>") # cuda, xpu, etc
torchcomm = new_comm("<backend>", device, name="main_comm") # nccl, ncclx, rccl, xccl, etc

rank = torchcomm.get_rank()
device_id = rank % torch.<device>.device_count()
target_device = torch.device(f"<device>:{device_id}")

# Create tensor
tensor = torch.full((1024,), float(rank + 1), dtype=torch.float32, device=target_device)

# Start async AllReduce
work = torchcomm.all_reduce(tensor, ReduceOp.SUM, async_op=True)

# Do other work while communication happens
print(f"Rank {rank}: Doing other work while AllReduce is in progress...")

# Wait for completion
work.wait()
print(f"Rank {rank}: AllReduce completed")

torchcomm.finalize()

Contributing

See the CONTRIBUTING file for how to help out.

License

torchcomms License

Source code is made available under a BSD 3 license, however you may have other legal obligations that govern your use of other content linked in this repository, such as the license or terms of service for third-party data and models.

Other Licenses

torchcomms backends include third-party source code may be using other licenses. Please check the directory and relevant files to verify the license.

For convenience some of them are listed below:

NCCL License

Name		Name	Last commit message	Last commit date
Latest commit History 1,628 Commits
.github		.github
comms		comms
docs		docs
scripts		scripts
tools/linter/adapters		tools/linter/adapters
.clang-format		.clang-format
.gitignore		.gitignore
.gitmodules		.gitmodules
.lintrunner.toml		.lintrunner.toml
.pyre_configuration		.pyre_configuration
CLAUDE.md		CLAUDE.md
CMakeLists.txt		CMakeLists.txt
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
build_ncclx.sh		build_ncclx.sh
build_rccl.sh		build_rccl.sh
build_rcclx.sh		build_rcclx.sh
logo-dark.png		logo-dark.png
logo-light.png		logo-light.png
pyproject.toml		pyproject.toml
rename_symbols.sh		rename_symbols.sh
requirements.txt		requirements.txt
setup.py		setup.py
setup_rcclx.sh		setup_rcclx.sh
uv.lock		uv.lock
version.txt		version.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

torchcomms

Prerequisites

Installation

Using pip (Stable)

Using pip (Nightly Builds)

Building from Source

Prerequisites

Build the backend (choose one based on your hardware):

Standard NCCL Backend

NCCLX Backend

RCCL Backend

RCCLX Backend

XCCL Backend

Install torchcomms:

Build Configuration

Quick Start Example

Running the Example

Asynchronous Operations

Contributing

License

torchcomms License

Other Licenses

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

torchcomms

Prerequisites

Installation

Using pip (Stable)

Using pip (Nightly Builds)

Building from Source

Prerequisites

Build the backend (choose one based on your hardware):

Standard NCCL Backend

NCCLX Backend

RCCL Backend

RCCLX Backend

XCCL Backend

Install torchcomms:

Build Configuration

Quick Start Example

Running the Example

Asynchronous Operations

Contributing

License

torchcomms License

Other Licenses

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages