ROLL Debugging Guide
When developing and using the ROLL framework, debugging is an essential step. This document will introduce several effective debugging methods to help you quickly locate and resolve issues.
1. Using Ray Debugger
ROLL is built on Ray, so you can use the debugging tools provided by Ray. Ray Debugger is a powerful tool that can help you debug distributed applications.
Enabling Ray Debugger
In the roll/utils/ray_utils.py
file, you can enable Ray Debugger by setting environment variables:
# For debugging
env_vars["RAY_DEBUG"] = "legacy"
You can set this environment variable before starting the training script:
export RAY_DEBUG=legacy
Using Ray Debugger
After enabling Ray Debugger, you can use standard Python debuggers (such as pdb) for step-by-step debugging. When the program reaches a breakpoint, the debugger will pause execution, allowing you to inspect variables, call stacks, and other information.
For detailed usage of Ray Debugger, please refer to the official documentation: Ray Debugging Documentation
2. Debugging Code in Pipeline
Since the Pipeline runs directly in Ray's driver, you can directly use pdb for debugging. To add breakpoints, use:
import pdb; pdb.set_trace()
Do not use breakpoint() in the pipeline. In Ray debug mode, breakpoint() in the driver will not enter pdb.
3. Local Debugging of Agentic Multi-Round Interaction Process
One feature of the ROLL framework is support for debugging the Agentic multi-round interaction process. This is very helpful for developing and optimizing Agentic applications.
Using Test Scripts
In the tests/agentic/env_manager/test_traj_env_manager.py
file, test demo scripts for locally debugging the Agentic multi-round interaction process are provided, which you can extend with new tests.
The script includes the following functions:
test_debug_traj_env_manager()
- Debug trajectory environment managertest_debug_vl_traj_env_manager()
- Debug vision-language trajectory environment managertest_debug_step_env_manager()
- Debug step environment manager
Running Debug Scripts
To run the debug script, follow these steps:
- Create and activate a Python environment:
conda create -n python310_torch260_em python=3.10
conda activate python310_torch260_em
- Install dependencies:
pip3 install torch torchvision torchaudio py-cpuinfo
pip install -r requirements_em_local_debug.txt
- Run the test script:
python tests/agentic/env_manager/test_traj_env_manager.py
Through this approach, you can locally debug the Agentic multi-round interaction process, significantly improving the actual development efficiency of Agentic applications.
4. Other Debugging Techniques
Log Debugging [TODO]
The ROLL framework has a built-in detailed logging system. You can obtain more debugging information by adjusting the log level:
# Set log level in configuration file
system_envs:
ROLL_LOG_LEVEL: "DEBUG"
Performance Analysis
To get the training timeline, you can enable profiling in the YAML configuration:
system_envs:
RAY_PROFILING: "1"
profiler_output_dir: /data/oss_bucket_0/llm/profile/${exp_name}
Then use the Perfetto UI tool for analysis.
By properly using these debugging methods, you can more efficiently develop and optimize applications based on the ROLL framework.