Tagged articles
3 articles
Page 1 of 1
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
May 6, 2026 · Artificial Intelligence

How Qwen’s Mid‑Training with Value‑Document Guides Slashes Error Rates

Researchers at Claude applied the MSM (mid‑training) approach to Qwen models, inserting a value‑document pre‑training phase before alignment fine‑tuning, which reduced misalignment rates from 68%/54% to 5%/7% and cut required fine‑tuning data by 40‑60×, demonstrating superior generalization when combined with standard alignment.

AI alignmentLarge Language ModelsMSM
0 likes · 6 min read
How Qwen’s Mid‑Training with Value‑Document Guides Slashes Error Rates
Machine Heart
Machine Heart
Apr 2, 2026 · Artificial Intelligence

Dual Alignment Theory Redefines Cross-Domain Offline RL Transfer

The paper revisits cross-domain offline reinforcement learning, showing that aligning both dynamics and value of source data is essential for effective policy transfer, and introduces the DVDF framework that jointly filters source samples, achieving consistent performance gains across multiple robotic control benchmarks.

DVDFPolicy Optimizationcross-domain transfer
0 likes · 13 min read
Dual Alignment Theory Redefines Cross-Domain Offline RL Transfer