v1.9.0
Release Date
TBD, 2026
SDK
New Features
-
SDK: Support GPU parameter configuration (#1047)
-
SDK:
rock datasets listnow supports fast listing across cross-region OSS (#1010) -
SDK: Add
rock storage getcommand to download archived sandbox logs from OSS (#962)
Bug Fixes
-
SDK: Fix illegal characters in generated Harbor job names (#1031)
-
SDK: Fix OSS upload failure caused by
wget -cresume not overwriting existing files (#992)
Sandbox
New Features
-
Support sandbox restart functionality (#1001)
-
Add
/deleteendpoint with cascade STOPPED → DELETED transition for--rmcontainers (#1038) -
Introduce SandboxStateMachine for unified lifecycle state management (#988)
-
Add ops-jobs API with DB-persisted state and multi-pod safety (#1027)
-
Add parameter validation for Admin API endpoints (#985)
-
K8s Operator: Support disk quota limits (#994)
Bug Fixes
-
Fix stop reason lost after #988 FSM refactor (#1021)
-
Fix exception handling when actor not found in RayOperator.get_status() (#1062)
-
Fix exception handling when CRD not found in K8sOperator.get_status() (#1068)
-
Fix stop_time not written when start_time is absent on stop() (start-failed sandboxes) (#1020)
-
Fix start() not properly delegating to start_async(), causing missing meta store write (#1051)
-
Fix Admin SandboxTable retry on stale connection after DB restart (#987)
Refactoring
- Meta-store: Add Redis-merge semantics for archive and alive-key field filtering (#1037)
Deployments
New Features
-
Split
docker runintodocker create+docker start -afor finer container lifecycle control (#1012) -
Share docker rootfs XFS project ID with sandbox log directory quota (#1013)
Scheduler
New Features
-
Switch FileCleanupTask to
find -deletewith minimal path safety guards (#967) -
Add DB-driven SandboxLogArchiveTask, replacing legacy sentinel file design (#1025)
-
Enhanced Ray log cleanup: (#1029)
-
PART 2c: Clean
runtime_env_setup-*files (covers hex suffix) -
PART 2d: Clean rotated daemon logs (
raylet.N.out,gcs_server.N.err, etc.) -
PID-aware cleanup for
session_latest/logs+logs/olddirectory -
Protect
agent-*and other daemon files from PID probe false positives
-
-
Deduplicate region scheduler.tasks via base config inheritance (#1003)
Bug Fixes
-
FileCleanupTask: Fix
exclude_dirswhitelist ineffective due to-depthdisabling-prune, replaced with-not -path(#1072) -
FileCleanupTask: Fix PID/TID reuse false positives in
check_pid_exists, add process name verification (#1074) -
FileCleanupTask: Use
find -type din_discover_candidatesto skip daemon log files (#1025) -
ImageCleanupTask: Split idempotent prune from docuum launch logic (#1023)
-
SandboxLogArchiveTask: Fix cross-event-loop asyncpg pool issue, dispatch DB calls to main loop (#1025)
-
Scheduler: Add 60s timeout cap on cross-loop dispatch to prevent hang (#1025)
Rocklet
New Features
- Add per-disk usage monitoring for rootfs, log, and kata DinD (#983)
Bug Fixes
-
Fix
/executeand/read_filereturning 422 due to NonBlankStr regression from PR #985 (#1065) -
Fix
successandfile_namenot set correctly in UploadResponse (#1060) -
Use cgroup metrics for container memory instead of psutil (fix inaccurate metrics in DinD) (#1017)
Harbor (Agent Job)
New Features
- Add tracking support to Harbor environment config, job config, and api_key field (#999)
CI/Testing
-
CI: Run admin+network tests only on push, skip on PRs (#1040)
-
Fix docker disk-limit test cases and cross-platform CI compatibility (#967)