Tagged articles
2 articles
Page 1 of 1
Data Party THU
Data Party THU
May 27, 2026 · Artificial Intelligence

How Bengio’s TBA Decouples Sampling and Learning to Speed Up LLM RL by 50×

The article explains how large‑language‑model post‑training suffers from rollout bottlenecks, introduces the Trajectory Balance with Asynchrony (TBA) framework that separates a Searcher from a Trainer, reuses off‑policy trajectories via a Trajectory Balance objective, and demonstrates up to 50× speed‑ups while preserving or improving performance on math reasoning, preference fine‑tuning, and automated red‑team tasks.

Asynchronous TrainingLLMLarge Models
0 likes · 9 min read
How Bengio’s TBA Decouples Sampling and Learning to Speed Up LLM RL by 50×
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
May 12, 2026 · Artificial Intelligence

Breaking Off‑Policy Shift: Bengio’s TBA Decouples Sampling and Learning for 50× Faster LLM RL

Trajectory Balance with Asynchrony (TBA) separates sample generation (Searcher) from model updates (Trainer), uses a trajectory‑balance objective to incorporate off‑policy data, and achieves up to 50× speedup in large‑model RL post‑training while preserving or improving performance on math reasoning, preference fine‑tuning, and red‑team tasks.

Asynchronous TrainingLLMOff-Policy
0 likes · 10 min read
Breaking Off‑Policy Shift: Bengio’s TBA Decouples Sampling and Learning for 50× Faster LLM RL