Trackers and Metrics

The ROLL framework supports multiple experiment tracking tools to help you monitor and analyze the training process. This document will provide detailed instructions on how to configure and use these trackers.

Supported Trackers

The ROLL framework currently supports the following trackers:

TensorBoard - Visualization tool developed by Google
Weights & Biases (WandB) - Powerful machine learning experiment tracking platform
SwanLab - Next-generation AI experiment tracking tool
Stdout - Direct output to standard output

Configuring Trackers

In the YAML configuration file, trackers are configured through the track_with and tracker_kwargs parameters:

# Using TensorBoard
track_with: tensorboard
tracker_kwargs:
  log_dir: /path/to/tensorboard/logs

# Using Weights & Biases
track_with: wandb
tracker_kwargs:
  api_key: your_wandb_api_key
  project: your_project_name
  name: experiment_name
  notes: "Experiment description"
  tags:
    - tag1
    - tag2

# Using SwanLab
track_with: swanlab
tracker_kwargs:
  login_kwargs:
    api_key: your_swanlab_api_key
  project: your_project_name
  logdir: /path/to/swanlab/logs
  experiment_name: experiment_name
  tags:
    - tag1
    - tag2

# Using Stdout
track_with: stdout

SwanLab Usage Details

Configuring SwanLab

To use SwanLab in ROLL, configure as follows:

track_with: swanlab
tracker_kwargs:
  login_kwargs:
    api_key: your_api_key  # Your SwanLab API key
  project: roll-experiments  # Project name
  logdir: ./swanlog  # Log storage directory
  experiment_name: ${exp_name}  # Experiment name, usually using the exp_name variable
  tags:  # Experiment tags
    - roll
    - rl
    - experiment

Obtaining SwanLab API Key

Visit the SwanLab website
Register or log in to your account
Go to the user settings page
Find the API key and copy it

Metric Monitoring

The ROLL framework automatically records the following types of metrics:

Algorithm Performance Metrics

Validation Phase

val/score/mean: Average score per episode during validation phase. Reflects the model's average performance on unseen environments.
val/score/max / val/score/min: Maximum / minimum score per episode during validation phase.

critic/lr: Learning rate of the value function (Critic). The learning rate is the step size for optimizer updates to model parameters.
critic/loss: Loss between value network predictions and actual returns.
critic/value: Mean of value network predictions for batch states at the beginning of data collection or training. These values are typically used as baselines when calculating advantage functions.
critic/vpred: Mean of value network predictions for batch states in the current optimization. This value updates with training iterations.
critic/clipfrac: Whether value function clipping (value_clip) was used and the proportion of clipping effectiveness.
critic/error: Mean squared error between value network predictions and actual returns.

critic/score/mean: Mean of raw environment rewards.
critic/score/max / critic/score/min: Maximum / minimum of raw environment rewards.
critic/rewards/mean: Mean of normalized/clipped rewards.
critic/rewards/max / critic/rewards/min: Maximum / minimum of normalized/clipped rewards.
critic/advantages/mean: Mean of advantages. Reflects how much extra reward taking a specific action in a given state brings compared to the average level.
critic/advantages/max / critic/advantages/min: Maximum / minimum of advantages.
critic/returns/mean: Mean of returns. Expected cumulative rewards.
critic/returns/max / critic/returns/min: Maximum / minimum of returns.
critic/values/mean: Mean of value function (Value Function) estimates. Reflects the model's estimate of future total rewards for a state.
critic/values/max / critic/values/min: Maximum / minimum of value function.
tokens/response_length/mean: Average length of generated responses.
tokens/response_length/max / tokens/response_length/min: Maximum / minimum length of generated responses.
tokens/prompt_length/mean: Average length of prompts.
tokens/prompt_length/max / tokens/prompt_length/min: Maximum / minimum length of prompts.

actor/lr: Learning rate of the current policy network (Actor). The learning rate is the step size for optimizer updates to model parameters.
actor/ppo_ratio_high_clipfrac: High clipping ratio in PPO policy optimization.
actor/ppo_ratio_low_clipfrac: Low clipping ratio in PPO policy optimization.
actor/ppo_ratio_clipfrac: Clipping ratio in PPO policy optimization.
actor/ratio_mean: Mean ratio of the policy network (Actor) (exponential of the ratio of new to old policy log probabilities).
actor/ratio_max / actor/ratio_min: Maximum / minimum ratio of the policy network (Actor).
actor/clipfrac: Clipping ratio of the policy network (Actor).
actor/kl_loss: KL divergence penalty term between current policy and reference policy. Used to prevent the policy from deviating too far from the original model.
actor/total_loss: Weighted sum of policy gradient loss, KL divergence loss, and entropy loss (if present). This is the actual loss used for model backpropagation.
actor/approxkl: Approximate KL divergence between current policy and old policy. Measures the step size of each policy update.
actor/policykl: Exact KL divergence between current policy and old policy.

Evaluation Metrics

critic/ref_log_prob/mean: Mean log probability output by the reference model. Used as a performance baseline for measuring old policy or reference policy.
critic/old_log_prob/mean: Mean log probability output by the old policy (Actor before training). Used to measure differences between new and old policies.
critic/entropy/mean: Mean entropy of the policy. Entropy measures the randomness or exploratory nature of the policy, with high entropy indicating stronger exploration.
critic/reward_clip_frac: Proportion of reward clipping. Reflects how many reward values were clipped, and if too high, may require adjusting reward range or clipping thresholds.

PPO Loss Metrics

actor/pg_loss: Policy gradient loss of the PPO algorithm. The goal is to minimize this loss to improve the policy.
actor/weighted_pg_loss: Weighted value of policy gradient loss.
actor/valid_samples: Number of valid samples in the current batch.
actor/total_samples: Total number of samples in the current batch (i.e., batch size).
actor/valid_sample_ratio: Proportion of valid samples in the current batch.
actor/sample_weights_mean: Mean of all sample weights in the batch.
actor/sample_weights_min / actor/sample_weights_max: Minimum / maximum of all sample weights in the batch.

SFT Loss Metrics

actor/sft_loss: Supervised fine-tuning loss.
actor/positive_sft_loss: Positive sample supervised fine-tuning loss.
actor/negative_sft_loss: Negative sample supervised fine-tuning loss.

Framework Performance Metrics

Global System Metrics

system/tps: Tokens processed per second. This is a key metric for measuring overall system throughput.
system/samples: Total number of samples processed.

Phase Duration Metrics

time/rollout: Duration of the data collection (Rollout) phase.
time/ref_log_probs_values_reward: Duration for computing reference model log probabilities and values.
time/old_log_probs_values: Duration for computing old policy log probabilities and values.
time/adv: Duration of the advantages calculation phase.

Execution Phases

In the following time and memory metrics, {metric_infix} will be replaced with specific execution phase identifiers, such as:

train_step: Training phase
generate: Text generation/inference phase
model_update: Model parameter update/synchronization phase
compute_log_probs: Log probability computation phase
do_checkpoint: Model saving/checkpoint phase
compute_values: Value computation phase
compute_rewards: Reward computation phase

Time Metrics

time/{metric_infix}/total: Total execution time for the entire operation (from entering state_offload_manager to exiting).
time/{metric_infix}/execute: Execution time for actual business logic (i.e., the yield part, such as model training, generation, etc.).
time/{metric_infix}/onload: Time to load model state (strategy.load_states()) to GPU or memory.
time/{metric_infix}/offload: Time to offload model state (strategy.offload_states()) from GPU or memory.

GPU Memory Metrics

Memory snapshot at the beginning (after model state offloading)
- memory/{metric_infix}/start/offload/allocated/{device_id}: Currently allocated GPU memory on a specific device_id.
- memory/{metric_infix}/start/offload/reserved/{device_id}: Currently reserved GPU memory on a specific device_id.
- memory/{metric_infix}/start/offload/max_allocated/{device_id}: Peak allocated GPU memory from the start of this operation to the current moment on a specific device_id.
- memory/{metric_infix}/start/offload/max_reserved/{device_id}: Peak reserved GPU memory from the start of this operation to the current moment on a specific device_id.
Memory snapshot after loading model state (before executing business logic)
- memory/{metric_infix}/start/onload/allocated/{device_id}: Currently allocated GPU memory on a specific device_id.
- memory/{metric_infix}/start/onload/reserved/{device_id}: Currently reserved GPU memory on a specific device_id.
- memory/{metric_infix}/start/onload/max_allocated/{device_id}: Peak allocated GPU memory from the start of this operation to the current moment on a specific device_id.
- memory/{metric_infix}/start/onload/max_reserved/{device_id}: Peak reserved GPU memory from the start of this operation to the current moment on a specific device_id.
Memory snapshot after executing business logic (before offloading model state)
- memory/{metric_infix}/end/onload/allocated/{device_id}: Currently allocated GPU memory on a specific device_id.
- memory/{metric_infix}/end/onload/reserved/{device_id}: Currently reserved GPU memory on a specific device_id.
- memory/{metric_infix}/end/onload/max_allocated/{device_id}: Peak allocated GPU memory from the start of this operation to the current moment on a specific device_id.
- memory/{metric_infix}/end/onload/max_reserved/{device_id}: Peak reserved GPU memory from the start of this operation to the current moment on a specific device_id.
- memory/{metric_infix}/end/onload/max_allocated_frac/{device_id}: Fraction of peak allocated GPU memory relative to total GPU memory on a specific device_id.
- memory/{metric_infix}/end/onload/max_reserved_frac/{device_id}: Fraction of peak reserved GPU memory relative to total GPU memory on a specific device_id.
Memory snapshot after offloading model state (at operation end)
- memory/{metric_infix}/end/offload/allocated/{device_id}: Currently allocated GPU memory on a specific device_id.
- memory/{metric_infix}/end/offload/reserved/{device_id}: Currently reserved GPU memory on a specific device_id.
- memory/{metric_infix}/end/offload/max_allocated/{device_id}: Peak allocated GPU memory from the start of this operation to the current moment on a specific device_id.
- memory/{metric_infix}/end/offload/max_reserved/{device_id}: Peak reserved GPU memory from the start of this operation to the current moment on a specific device_id.

CPU Memory Metrics

memory/cpu/{metric_infix}/start/rss: Actual physical memory (Resident Set Size) occupied by the process at the start of the operation.
memory/cpu/{metric_infix}/start/vms: Virtual memory (Virtual Memory Size) occupied by the process at the start of the operation.
memory/cpu/{metric_infix}/end/rss: Actual physical memory occupied by the process at the end of the operation.
memory/cpu/{metric_infix}/end/vms: Virtual memory occupied by the process at the end of the operation.

Trackers and Metrics

Supported Trackers​

Configuring Trackers​

SwanLab Usage Details​

Configuring SwanLab​

Obtaining SwanLab API Key​

Metric Monitoring​

Algorithm Performance Metrics​

Validation Phase​

Value Related​

Reward Related​

Policy Related​

Evaluation Metrics​

PPO Loss Metrics​

SFT Loss Metrics​

Framework Performance Metrics​

Global System Metrics​

Phase Duration Metrics​

Execution Phases​

Time Metrics​

GPU Memory Metrics​

CPU Memory Metrics​