
ROLL: Reinforcement Learning Optimization for Large-Scale Learning
🚀 An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models 🚀
ROLL is an efficient and user-friendly RL library designed for Large Language Models (LLMs) utilizing Large Scale GPU resources. It significantly enhances LLM performance in key areas such as human preference alignment, complex reasoning, and multi-turn agentic interaction scenarios.
Leveraging a multi-role distributed architecture with Ray for flexible resource allocation and heterogeneous task scheduling, ROLL integrates cutting-edge technologies like Megatron-Core, SGLang and vLLM to accelerate model training and inference.
[08/11/2025] 🎉 Our Paper released, see Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning. [06/09/2025] 🎉 ROLL tech report is now available! Access the report here.
🚀 快速入门
快速上手
安装指南
快速上手:单机版部署指南
快速上手:多节点部署指南
ROLL 调试指南
常见问题解答 (Q&A)
使用指南
配置
ROLL 配置系统详解
ROLL 配置指南
ROLL 资源配置
Off-Policy 算法配置指南
vLLM 推理后端配置指南
SGLang 推理后端配置指南
Megatron 推理和训练后端配置指南
LoRA 微调配置指南
FP8 量化配置指南
DeepSpeed 训练后端配置指南
流水线
VLM RLVR 流水线
RLVR 流水线
DPO 流水线
Distill 流水线
Agentic 流水线
Comprehensive Guide: Using the Agentic Part of ROLL
算法
TOPR (Tapered Off-Policy REINFORCE)
Reward Feedback Learning (Reward FL)
Reinforce++
RAFT++ (Reward rAnked Fine-Tuning)
Proximal Policy Optimization (PPO)
Lite PPO
Group Sequence Policy Optimization (GSPO)
Group Relative Policy Optimization (GRPO)
Agentic
Agentic 工程实践文档
TrajWiseLearning——StarPO (State-Thinking-Actions-Reward Policy Optimization)
StepWiseLearning——GiGPO (Group-in-Group Policy Optimization)
Tool Use 使用指南
高级特性
Agentic 异步并行 Rollout
ROLL 异步训练功能使用指南
检查点保存与恢复指南
MCoreAdapter 模型转换为 Hugging Face 格式
GPU 时分复用控制指南
Tracker 和 Metrics
硬件支持
开发
架构
开发者指南
We welcome contributions from the community! 🤝