AReaL‑boba: Open‑Source Reinforcement Learning Training Framework v0.2 with SOTA Performance
The Ant Research Institute and Tsinghua University's Wu Yi team released AReaL‑boba 0.2, an open‑source reinforcement‑learning training framework that dramatically speeds up large‑scale model training, achieves state‑of‑the‑art mathematical reasoning results, and provides all code, data, and scripts for reproducible research.
The Ant Research Institute and Tsinghua University's Wu Yi team jointly released the open‑source reinforcement‑learning training framework AReaL‑boba 0.2, making the full code, data, and training scripts publicly available to facilitate easy reproduction of SOTA inference models.
Key Highlights
Training Speed and Efficiency Breakthrough
Ultra‑fast training throughput: integrating the SGLang framework used by xAI, the system improves training speed by 35%/60%/73% for 1.5B, 7B, and 32B models respectively.
Massive distributed support: 128 H800 GPUs can train a 1.5B model in one day, and 256 H800 GPUs can train a 7B model in two days.
Mathematical Reasoning Performance (SOTA)
7B model sets a new open‑source community record: using Qwen‑R1‑Distill‑7B as the base, large‑scale RL training achieves the best domain‑specific math reasoning scores—61.9 on AIME 2024 and 48.3 on AIME 2025.
Full‑Process Open Verification
All training data (AReaL‑boba‑106k), training scripts, and evaluation scripts are released to ensure reproducibility.
Low‑Cost Replication of Large‑Model Effects
By data distillation, a 32B model (Qwen‑32B‑Distill) is fine‑tuned with only 200 data points, achieving 78.8 AIME 2024 score (close to QwQ‑32B’s 78.9) at a cost of just $200. The table below compares scores:
Model
AIME 2024
R1‑Distill‑Qwen‑32B
72.6
QwQ‑32B
78.9
AReaL‑boba‑SFT‑32B
78.8
Open‑Source Commitment
No‑restriction release: framework code, training data (including the full 106k dataset and 200 distilled samples), model weights, and documentation are all open‑source.
Community‑driven: PPO hyper‑parameters, reward function design, regularization strategies, and plans for asynchronous training and dataset upgrades are publicly shared.
Get Started
GitHub repository: https://github.com/inclusionAI/AReaL
Hugging Face collection: https://huggingface.co/collections/inclusionAI/areal-boba-67e9f3fa5aeb74b76dcf5f0a
Technical details: https://github.com/inclusionAI/AReaL/blob/main/blog/AReaL_v0_2.md
Training data: https://huggingface.co/datasets/inclusionAI/AReaL-boba-Data/blob/main/AReaL-boba-106k.jsonl
The AReaL team aims to democratize reinforcement‑learning technology, hoping the framework becomes as commonplace as a daily beverage for AI developers, enabling the community to explore the limitless possibilities of intelligent systems.
AntTech
Technology is the core driver of Ant's future creation.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.