NebulaSQL: A Large-scale Feature Computation System for Online Recommendation
SIGMOD 2026 Industry
The Algorithm Platform team of Alibaba Holding Group's Intelligent Engine Business Unit builds Alibaba Group's model training infrastructure and is responsible for data and training Infra for the HappyHorse and HappyOyster model families. The team maintains industry-leading pre-training and post-training frameworks for large language models, multimodal models, and generative models, together with sample storage and compute systems. Open source projects include Megatron-LLaMA, ROLL, and RecIS. The team has published multiple works at top conferences including NSDI, OSDI, and SIGMOD, and received the 2026 NSDI Outstanding Paper Award. Through distributed optimization, software-hardware co-design, and model-Infra codesign, the team optimizes large model iteration efficiency from data processing through training, expands the ceiling of model quality, and builds frontier infrastructure for large models.
We are responsible for building Alibaba's large-scale training infrastructure, covering LLM training Infra, multi-modal large model training Infra, recommendation algorithm model training Infra, feature computation & processing Infra, and algorithm platform construction.
SIGMOD 2026 Industry
NSDI 2026 Spring · 🏆 Outstanding Paper Award
ICDE 2022
An open-source LLM training framework based on Megatron, supporting efficient distributed training.
Alibaba's open-source sparse model training framework for large-scale recommendation and advertising.
An open-source RL training framework for efficient distributed RL post-training at scale.
Alibaba's open-source distributed graph learning engine for large-scale GNN training.
A large model training framework for recommendation and advertising at industrial scale.
Select a role or use the apply button to open the Alibaba Talent application page.