LoRA Fine-tuning Configuration Guide

LoRA (Low-Rank Adaptation) is an efficient parameter-efficient fine-tuning method that achieves parameter-efficient fine-tuning by adding low-rank matrices to pre-trained models. This document will provide detailed instructions on how to configure and use LoRA fine-tuning in the ROLL framework.

LoRA Introduction

LoRA achieves parameter-efficient fine-tuning through the following approaches:

Low-Rank Matrix Decomposition: Decompose weight update matrices into the product of two low-rank matrices
Parameter Efficiency: Train only a small number of additional parameters instead of all model parameters
Easy Deployment: Fine-tuned models can be easily merged into the original model

Configuring LoRA Fine-tuning

In the ROLL framework, LoRA fine-tuning can be configured by setting relevant parameters in the YAML configuration file.

Configuration Example

The following is a typical LoRA configuration example (from examples/qwen2.5-7B-rlvr_megatron/rlvl_lora_zero3.yaml):

# LoRA global configuration
lora_target: o_proj,q_proj,k_proj,v_proj
lora_rank: 32
lora_alpha: 32

actor_train:
  model_args:
    attn_implementation: fa2
    disable_gradient_checkpointing: true
    dtype: bf16
    lora_target: ${lora_target}
    lora_rank: ${lora_rank}
    lora_alpha: ${lora_alpha}
    model_type: ~
  training_args:
    learning_rate: 1.0e-5
    weight_decay: 0
    per_device_train_batch_size: 1
    gradient_accumulation_steps: 32
    warmup_steps: 20
    num_train_epochs: 50
  strategy_args:
    strategy_name: deepspeed_train
    strategy_config: ${deepspeed_zero3}
  device_mapping: list(range(0,16))
  infer_batch_size: 4

actor_infer:
  model_args:
    attn_implementation: fa2
    disable_gradient_checkpointing: true
    dtype: bf16
    lora_target: ${lora_target}
    lora_rank: ${lora_rank}
    lora_alpha: ${lora_alpha}
  generating_args:
    max_new_tokens: ${response_length}
    top_p: 0.99
    top_k: 100
    num_beams: 1
    temperature: 0.99
    num_return_sequences: ${num_return_sequences_in_group}
  strategy_args:
    strategy_name: vllm
    strategy_config:
      gpu_memory_utilization: 0.6
      enforce_eager: false
      block_size: 16
      max_model_len: 8000
  device_mapping: list(range(0,12))
  infer_batch_size: 1

Configuration Parameter Details

lora_target: Specify the model layers to apply LoRA
- For example: o_proj,q_proj,k_proj,v_proj means applying LoRA to the output projection and query, key, value projection layers in the attention mechanism
- Can be adjusted according to the specific model structure
lora_rank: Rank of the LoRA matrix
- Controls the size of the LoRA matrix
- Smaller ranks can reduce the number of parameters but may affect performance
- Usually set to 8, 16, 32, 64, etc.
lora_alpha: LoRA scaling factor
- Controls the magnitude of LoRA updates
- Usually set to the same as lora_rank or its multiple
LoRA Parameters in model_args:
- lora_target: Specify the layers to apply LoRA
- lora_rank: Rank of the LoRA matrix
- lora_alpha: LoRA scaling factor

LoRA Compatibility with Training Backends

Currently, LoRA fine-tuning only supports the DeepSpeed training backend:

actor_train:
  strategy_args:
    strategy_name: deepspeed_train  # LoRA only supports deepspeed_train

This is because DeepSpeed provides optimization features that integrate well with LoRA.

Performance Optimization Recommendations

Selecting Appropriate LoRA Layers:
- Applying LoRA to attention mechanism-related layers usually works well
- The best LoRA layer combination can be determined through experimentation
Adjusting LoRA Parameters:
- lora_rank: Adjust according to model size and task complexity
- lora_alpha: Usually set to lora_rank or its multiple
Learning Rate Setting:
- LoRA fine-tuning usually requires a higher learning rate
- Set to 1.0e-5 in the example

Notes

LoRA fine-tuning currently only supports the DeepSpeed training backend
Ensure the model supports LoRA fine-tuning
Pay attention to compatibility with LoRA when using gradient checkpointing
LoRA fine-tuning performance may differ from full parameter fine-tuning and needs to be evaluated according to specific tasks

By properly configuring LoRA fine-tuning, you can significantly reduce the number of training parameters and computational resource consumption while maintaining model performance.

LoRA Fine-tuning Configuration Guide

LoRA Introduction​

Configuring LoRA Fine-tuning​

Configuration Example​

Configuration Parameter Details​

LoRA Compatibility with Training Backends​

Performance Optimization Recommendations​

Notes​