ROLL Resource Configuration
In the ROLL framework, resource settings are specified through the device_mapping
parameter in YAML configuration files to determine which GPU devices each worker uses. This document will provide detailed instructions on how to configure resources, including colocated and disaggregated modes, multi-role resource configuration, and how worker counts are calculated.
GPU Resource Configuration
In ROLL, GPU resource settings are configured by specifying the device_mapping
parameter for each worker in the YAML configuration file. This parameter is a string that can be parsed by Python's eval()
function into a list, where the values in the list represent global logical GPU RANKs.
For example:
actor_train:
device_mapping: list(range(0,16))
actor_infer:
device_mapping: list(range(16,24))
In this example, the system requires a total of 24 GPUs, where actor_train
is deployed on GPUs [0,16) and actor_infer
is deployed on GPUs [16,24).
CPU Resource Configuration
For workers that only use CPU resources, simply configure the world_size
parameter, and the system will automatically deploy the corresponding number of workers (ray.Actor) on CPU resources.
For example:
code_sandbox:
world_size: 8
Colocated and Disaggregated Modes
Colocated Mode
In colocated mode, multiple roles share the same GPU resources. This approach can improve resource utilization and reduce resource waste.
For example, in examples/docs_examples/example_grpo.yaml
:
actor_infer:
device_mapping: list(range(0,8)) # Shares GPUs [0,8) with actor_train, GPU time-division multiplexing
# ...
actor_train:
device_mapping: list(range(0,8))
Disaggregated Mode
In disaggregated mode, different roles use different GPU resources. This independent deployment approach is key to implementing asynchronous training.
ROLL directly implements disaggregated deployment by setting different device_mapping
for different workers.
For example, in examples/qwen2.5-7B-agentic_megatron/agentic_val_webshop_async.yaml
:
# actor train uses GPUs [0, 1, 2, 3], actor_infer uses GPUs [4, 5, 6, 7]
#
actor_train:
device_mapping: list(range(0,4))
actor_infer:
device_mapping: list(range(4,8))
Flexibility in Multi-Role Resource Configuration
The ROLL framework supports configuring different resource strategies for different roles to meet the needs of various application scenarios:
- Different roles can use different numbers of GPUs
- Different roles can use different inference engines (such as vLLM, SGLang, etc.)
- Different roles can set different
num_gpus_per_worker
parameters- The num_gpus_per_worker for training roles is always 1
- The num_gpus_per_worker for inference roles is >=1 based on required resources, with vLLM automatically calculated based on parallel settings and SGLang directly specified
For example, when using vLLM as the inference engine, num_gpus_per_worker
is automatically calculated based on tensor parallel (tensor_parallel_size) and pipeline parallel (pipeline_parallel_size) settings:
actor_infer:
strategy_args:
strategy_name: vllm
strategy_config:
tensor_parallel_size: 2
pipeline_parallel_size: 1
num_gpus_per_worker: 2 # Automatically calculated as tensor_parallel_size * pipeline_parallel_size
Worker Count Calculation
The number of workers (world_size
) is automatically calculated based on the device_mapping
and num_gpus_per_worker
parameters:
world_size = len(device_mapping) // num_gpus_per_worker
In the WorkerConfig.__post_init__()
method, if device_mapping
is not None, the following logic is executed:
- Parse the string into a list through
eval(device_mapping)
- Verify that
len(device_mapping)
is divisible bynum_gpus_per_worker
- Calculate
world_size = len(device_mapping) // num_gpus_per_worker
For workers that only use CPU, the worker count is directly specified through the world_size
parameter, and num_gpus_per_worker
is set to 0.
Configuration Example
The following is a complete resource configuration example:
num_gpus_per_node: 8
actor_train:
device_mapping: list(range(0,16)) # Uses 16 GPUs
# world_size automatically calculated as 16 // 1 = 16
actor_infer:
num_gpus_per_worker: 2 # Each worker uses 2 GPUs
device_mapping: list(range(0,12)) # Uses 12 GPUs
# world_size automatically calculated as 12 // 2 = 6
rewards:
code_sandbox:
world_size: 8 # CPU-only, deploys 8 workers
By properly configuring these parameters, resources can be flexibly allocated to different roles to meet various training and inference requirements.