DeepSpeed Training Backend Configuration Guide

DeepSpeed is Microsoft's efficient deep learning optimization library that provides memory optimization, distributed training, and performance optimization features. This document will provide detailed instructions on how to configure and use the DeepSpeed training backend in the ROLL framework.

DeepSpeed Introduction

DeepSpeed provides multiple optimization techniques, including:

ZeRO Optimization: Reduces memory usage by partitioning optimizer states, gradients, and parameters
Memory-Efficient Training: Supports training of large-scale models
High-Performance Communication: Optimizes communication efficiency in distributed training
Flexible Configuration: Supports configuration of multiple optimization levels

Configuring DeepSpeed Strategy

In the ROLL framework, DeepSpeed training strategy can be configured by setting strategy_args in the YAML configuration file.

Configuration Example

The following is a typical DeepSpeed configuration example (from examples/qwen2.5-7B-rlvr_megatron/rlvl_lora_zero3.yaml):

defaults:
  - ../config/deepspeed_zero@_here_
  - ../config/deepspeed_zero2@_here_
  - ../config/deepspeed_zero3@_here_
  - ../config/deepspeed_zero3_cpuoffload@_here_

actor_train:
  model_args:
    attn_implementation: fa2
    disable_gradient_checkpointing: true
    dtype: bf16
    model_type: ~
  training_args:
    learning_rate: 1.0e-5
    weight_decay: 0
    per_device_train_batch_size: 1
    gradient_accumulation_steps: 32
    warmup_steps: 20
    num_train_epochs: 50
  strategy_args:
    strategy_name: deepspeed_train
    strategy_config: ${deepspeed_zero3}
  device_mapping: list(range(0,16))
  infer_batch_size: 4

Configuration Parameter Details

strategy_name: Set to deepspeed_train to use the DeepSpeed training backend
strategy_config: DeepSpeed-specific configuration parameters
- Can reference predefined configuration files, such as ${deepspeed_zero3}
- Multiple DeepSpeed configuration files are available in the ./examples/config/ directory:
  - deepspeed_zero.yaml: Basic ZeRO configuration
  - deepspeed_zero2.yaml: ZeRO-2 configuration
  - deepspeed_zero3.yaml: ZeRO-3 configuration
  - deepspeed_zero3_cpuoffload.yaml: ZeRO-3 configuration with CPU offloading

defaults section: Import predefined DeepSpeed configurations

defaults:
  - ../config/deepspeed_zero@_here_
  - ../config/deepspeed_zero2@_here_
  - ../config/deepspeed_zero3@_here_
  - ../config/deepspeed_zero3_cpuoffload@_here_

device_mapping: Specify the list of GPU device IDs to use

DeepSpeed Configuration Files

Multiple predefined DeepSpeed configuration files are provided in the ./examples/config/ directory:

deepspeed_zero.yaml: Basic ZeRO configuration
deepspeed_zero2.yaml: ZeRO-2 configuration with optimizer state partitioning
deepspeed_zero3.yaml: ZeRO-3 configuration with optimizer state, gradient, and parameter partitioning
deepspeed_zero3_cpuoffload.yaml: ZeRO-3 configuration with CPU offloading

Using Predefined Configurations

To use predefined DeepSpeed configurations, you can reference them in the YAML file like this:

defaults:
  - ../config/deepspeed_zero3@_here_

actor_train:
  strategy_args:
    strategy_name: deepspeed_train
    strategy_config: ${deepspeed_zero3}

Integration with Other Components

In the configuration example, we can see:

actor_train uses DeepSpeed for training
actor_infer may use other inference backends (such as vLLM)
reference uses the Hugging Face inference backend
Reward models use different inference backends

This design allows different components to choose the most suitable backend according to their needs.

Notes

DeepSpeed requires specific versions of dependency libraries, please ensure compatible versions are installed
Different ZeRO levels have different memory and performance characteristics, choose according to specific needs
When using LoRA fine-tuning, pay attention to compatibility with DeepSpeed

DeepSpeed Training Backend Configuration Guide

DeepSpeed Introduction​

Configuring DeepSpeed Strategy​

Configuration Example​

Configuration Parameter Details​

DeepSpeed Configuration Files​

Using Predefined Configurations​

Integration with Other Components​

Notes​