Tagged articles
3 articles
Page 1 of 1
Machine Heart
Machine Heart
May 27, 2026 · Artificial Intelligence

AMD Paper Finds FP4 Training Instability Is Not Due to Randomness, 9‑10% Speedup

The authors demonstrate that FP4 training instability stems from structural micro‑scaling errors in the weight‑gradient path rather than insufficient randomness, and show that a deterministic Hadamard rotation restores convergence, delivering a 9‑10% end‑to‑end speedup on native FP4 hardware (AMD MI355X) while incurring only 8‑9% token overhead.

Deterministic HadamardFP4MXFP4
0 likes · 10 min read
AMD Paper Finds FP4 Training Instability Is Not Due to Randomness, 9‑10% Speedup
Baobao Algorithm Notes
Baobao Algorithm Notes
Mar 10, 2025 · Artificial Intelligence

Why DeepSeek V3’s FP8 Training Beats Traditional Schemes: A Deep Dive

This article provides a detailed technical analysis of FP8 training, comparing Nvidia’s TransformerEngine approach with DeepSeek V3’s novel scheme, and examines how block‑wise scaling, high‑precision accumulation, and vector length and correlation affect quantization error and signal‑to‑noise ratio in large‑language‑model training.

DeepSeekFP8LLM
0 likes · 20 min read
Why DeepSeek V3’s FP8 Training Beats Traditional Schemes: A Deep Dive