How dots.llm1 Sets New Benchmarks for Open‑Source MoE Language Models
dots.llm1, an open‑source 142‑billion‑parameter Mixture‑of‑Experts language model from hi lab, achieves Qwen2.5‑72B‑level performance after training on 11.2 T high‑quality tokens, and the release includes full models, intermediate checkpoints, and detailed training pipelines for the research community.
Overview
dots.llm1 is a large‑scale Mixture‑of‑Experts (MoE) language model released by the Humane Intelligence Lab (hi lab). It contains 142 billion total parameters, activates 14 billion per token, and after training on 11.2 T high‑quality tokens reaches performance comparable to Qwen2.5‑72B.
Model Details
Parameters: 142 B total, 14 B active.
MoE configuration: 6‑in‑128 experts with 2 shared experts.
Training data: 11.2 T tokens from Common Crawl and proprietary web crawl, filtered and de‑duplicated.
Training efficiency: Interleaved 1F1B pipeline with All‑to‑All overlap and optimized grouped GEMM, yielding ~14 % forward and ~6.7 % backward speed‑ups on H800 GPUs.
Training Procedure
The pre‑training uses a decoder‑only Transformer inspired by DeepSeek, with WSD learning‑rate schedule, batch‑size scaling from 64 M to 128 M tokens, and two fine‑tuning stages (base and instruct) that bring the model on par with Qwen2.5‑72B on multilingual, math, code and alignment benchmarks.
Open‑Source Release
hi lab provides the final Instruct model, the base model, intermediate checkpoints every 1 T tokens, and detailed hyper‑parameters, enabling continued pre‑training, annealing, long‑document training, or supervised fine‑tuning. Model and code are hosted on Hugging Face and GitHub.
Resources
Model repository: https://huggingface.co/rednote-hilab and GitHub .
Xiaohongshu Tech REDtech
Official account of the Xiaohongshu tech team, sharing tech innovations and problem insights, advancing together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.