
ROLL: Reinforcement Learning Optimization for Large-Scale Learning
๐ An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models ๐
ROLL is an efficient and user-friendly RL library designed for Large Language Models (LLMs) utilizing Large Scale GPU resources. It significantly enhances LLM performance in key areas such as human preference alignment, complex reasoning, and multi-turn agentic interaction scenarios.
Leveraging a multi-role distributed architecture with Ray for flexible resource allocation and heterogeneous task scheduling, ROLL integrates cutting-edge technologies like Megatron-Core, SGLang and vLLM to accelerate model training and inference.
[08/11/2025] ๐ Our Paper released, see Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning. [06/09/2025] ๐ ROLL tech report is now available! Access the report here.
๐ Get Startedโ
Installation
Quick Start: Single-Node Deployment Guide
Quick Start: Multi-Node Deployment Guide
Debugging Guide
Frequently Asked Questions
User Guidesโ
Configurationโ
Config System Explanation
Configuration Guide
Resource Configuration
Off-Policy Algorithms Configuration Guide
vLLM Inference Backend Configuration Guide
SGLang Inference Backend Configuration Guide
Megatron Inference and Training Backend Configuration Guide
LoRA Fine-tuning Configuration Guide
FP8 Quantization Configuration Guide
DeepSpeed Training Backend Configuration Guide
Pipelineโ
RLVR Pipeline for VLM
RLVR Pipeline
DPO Pipeline
Distill Pipeline
Agentic Pipeline
Comprehensive Guide: Using the Agentic Part of ROLL
Algorithmsโ
TOPR (Tapered Off-Policy REINFORCE)
Reward Feedback Learning (Reward FL)
Reinforce++
RAFT++ (Reward rAnked Fine-Tuning)
Proximal Policy Optimization (PPO)
Lite PPO
Group Sequence Policy Optimization (GSPO)
Group Relative Policy Optimization (GRPO)
Agenticโ
Agentic Engineering Practice Documentation
TrajWiseLearningโโStarPO (State-Thinking-Actions-Reward Policy Optimization)
StepWiseLearningโโGiGPO (Group-in-Group Policy Optimization)
Tool Use Guide
Advanced Featuresโ
Agentic Asynchronous Parallel Rollout
ROLL Asynchronous Training User Guide
Checkpoint Saving and Resuming Guide
Converting MCoreAdapter Models to Hugging Face Format
GPU Time-Division Multiplexing Control Guide
Tracker & Metricsโ
Hardware Supportโ
Developmentโ
Architectureโ
Developer Guideโ
How to Add Support for a New Model
Customer Env
Prompt Generation Guide
We welcome contributions from the community! ๐ค