Tagged articles
2 articles
Page 1 of 1
Baobao Algorithm Notes
Baobao Algorithm Notes
May 22, 2026 · Artificial Intelligence

How LiteScale Cuts Wait Times in Large‑Model Post‑Training with Gradient Accumulation

The article examines the bottleneck of synchronous rollout in large‑model post‑training, proposes an asynchronous design using gradient accumulation and a global micro‑batch count to preserve loss equivalence, and introduces LogitsExpress for efficient top‑K knowledge‑distillation communication, all implemented in the lightweight LiteScale framework.

Knowledge DistillationLarge Language Modelsasynchronous rollout
0 likes · 16 min read
How LiteScale Cuts Wait Times in Large‑Model Post‑Training with Gradient Accumulation
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Mar 1, 2026 · Artificial Intelligence

From Traditional RL to LLM RL: Theory Derivation and Practical Engineering Improvements

This article walks through the fundamental derivation of policy‑based reinforcement learning, explains how traditional RL concepts extend to large‑language‑model RL, and details engineering enhancements such as GRPO memory reduction, asynchronous rollout, importance‑sampling corrections, and token‑flow management for stable industrial‑scale training.

GRPOImportance SamplingRLHF
0 likes · 11 min read
From Traditional RL to LLM RL: Theory Derivation and Practical Engineering Improvements