This roadmap outlines the planned features and improvements for AReaL in the next quarter. We welcome community feedback and contributions to help shape the future direction of the project.
Latest Release: Check releases for the most recent version.
This roadmap tracks major planned enhancements through April 30, 2026. Items are organized into two categories:
- On-going: Features currently under active development by the core AReaL team
- Planned but not in progress: Features with concrete implementation plans where we welcome community contributions
On-going
- ZBPP & ZBPP-V support for the Archon backend
- FP8 training for Archon
Planned but not in progress
- Support for agentic training with large VLM MoE models (Archon backend)
- Omni model RL support with FSDP/Archon backend
- Decoupling agent service from the inference service
- Online RL training with the proxy server
- LoRA support for the Archon backend
- Colocation mode with
awexas the weight sync engine - Multi-LLM training (different agents with different parameters)
- Auto-scaling inference engines in single-controller mode
- Elastic weight update setup and acceleration
- RL training with cross-node vLLM pipeline/context parallelism
On-going
- Flatten the import structure of areal modules
Planned but not in progress
- Publishing PyPI packages
- Support distributed training and debugging in Jupyter notebooks
- Example of using a generative or critic-like reward model
- Support directly constructing inference/training engines without config objects
- Add router in rollout controller for simpler proxy server usage
- Integrate
aenvironmentfor environment handling
Planned but not in progress
- Use case guides: multi-agent training
- Guide for online proxy mode training
Backends
Completed:
- Single-controller mode
- Detailed profiling for optimal performance across different scales
- Low-precision RL training (Megatron FP8)
- Data transfer optimization in single-controller mode
- New PyTorch-native backend: Archon
Carried over to Q1 2026:
- Multi-LLM training (different agents with different parameters)
- Auto-scaling inference engines in single-controller mode
- Elastic weight update setup and acceleration
- RL training with cross-node vLLM pipeline/context parallelism
Usability
Completed:
- Add CI pipeline to build Docker images upon release
- Wrap training scripts into trainers
- Refactor FSDP/Megatron engine/controller APIs to finer granularity
- Fully respect allocation mode in trainers/training scripts
Carried over to Q1 2026:
- Flatten the import structure of areal modules
- Support distributed training and debugging in Jupyter notebooks
- Example of using a generative or critic-like reward model
Canceled:
- Rename
RemoteSGLang/vLLMEngineasSGLang/vLLMEngine
Documentation
Completed:
- Tutorial on how to write efficient async rollout workflows
- Benchmarking and profiling guide
- Use case guides: offline inference, offline evaluation
- AReaL performance tuning guide
- Device allocation strategies for training and inference
- Parallelism strategy configuration for training and inference
Carried over to Q1 2026:
- Use case guides: multi-agent training
Backends
Completed:
- Megatron training backend support
- SGLang large expert parallelism (EP) inference support
- Remote vLLM inference engine
- Ulysses context parallelism & tensor parallelism for FSDP backend
- End-to-end MoE RL training with large EP inference and Megatron expert parallelism
- Distributed weight resharder for Megatron training backend
Canceled:
- Local SGLang inference engine with inference/training colocation (hybrid engine)
- RL training with SGLang pipeline parallelism
Usability
Completed:
- OpenAI-compatible client support
- Support RLOO
- Provide benchmarking configuration examples:
- DAPO
- Bradley-Terry reward modeling
- PPO with critic models
- REINFORCE++
Documentation
Completed:
- OpenAI-compatible client documentation
- Out-of-memory (OOM) troubleshooting guide
- AReaL debugging best practices:
- LLM server-only debugging - How to launch LLM servers independently and debug agent workflows
- Mock data and torchrun debugging - Creating synthetic data and using
torchrunfor algorithm debugging - Training-free evaluation experiments - Running evaluations without training or additional GPUs
We value community input! Here's how you can help shape AReaL's future:
- Check Existing Issues: Search issues and discussions to see if your idea already exists
- Create a Feature Request: Use our feature request template
- Discuss in GitHub Discussions: Post in Ideas category for early feedback
- Vote on Features: Use 👍 reactions on issues to show support
Check our contribution guide.
Minor Releases: Bi-weekly - Bug fixes, small improvements, and new features
Major Releases: Quarterly - Important milestones and significant changes
Check our historical milestone summaries since open-source.
Our vision for AReaL is to become the go-to framework for training reasoning and agentic AI systems that is:
- Accessible: Easy to get started, whether you're a researcher or practitioner
- Scalable: Scales from laptop to 1000+ GPU clusters seamlessly
- Flexible: Supports diverse algorithms, models, and use cases
- Performant: Industry-leading training speed and efficiency
- Open: Fully open-source with transparent development
Last Updated: 2026-02-06
Questions about the roadmap? Open a discussion in GitHub Discussions or ask in our WeChat group.