Artificial Intelligence 10 min read

How dots.llm1 Sets New Benchmarks for Open‑Source MoE Language Models

dots.llm1, an open‑source 142‑billion‑parameter Mixture‑of‑Experts language model from hi lab, achieves Qwen2.5‑72B‑level performance after training on 11.2 T high‑quality tokens, and the release includes full models, intermediate checkpoints, and detailed training pipelines for the research community.

Xiaohongshu Tech REDtech

Jun 6, 2025

How dots.llm1 Sets New Benchmarks for Open‑Source MoE Language Models

Overview

dots.llm1 is a large‑scale Mixture‑of‑Experts (MoE) language model released by the Humane Intelligence Lab (hi lab). It contains 142 billion total parameters, activates 14 billion per token, and after training on 11.2 T high‑quality tokens reaches performance comparable to Qwen2.5‑72B.

Model Details

Parameters: 142 B total, 14 B active.

MoE configuration: 6‑in‑128 experts with 2 shared experts.

Training data: 11.2 T tokens from Common Crawl and proprietary web crawl, filtered and de‑duplicated.

Training efficiency: Interleaved 1F1B pipeline with All‑to‑All overlap and optimized grouped GEMM, yielding ~14 % forward and ~6.7 % backward speed‑ups on H800 GPUs.

Training Procedure

The pre‑training uses a decoder‑only Transformer inspired by DeepSeek, with WSD learning‑rate schedule, batch‑size scaling from 64 M to 128 M tokens, and two fine‑tuning stages (base and instruct) that bring the model on par with Qwen2.5‑72B on multilingual, math, code and alignment benchmarks.

Open‑Source Release

hi lab provides the final Instruct model, the base model, intermediate checkpoints every 1 T tokens, and detailed hyper‑parameters, enabling continued pre‑training, annealing, long‑document training, or supervised fine‑tuning. Model and code are hosted on Hugging Face and GitHub.

Resources

Model repository: https://huggingface.co/rednote-hilab and GitHub .

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Mixture of Experts Open Source Large Language Model AI research Training efficiency

Written by

Xiaohongshu Tech REDtech

Official account of the Xiaohongshu tech team, sharing tech innovations and problem insights, advancing together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.