AMD Paper Finds FP4 Training Instability Is Not Due to Randomness, 9‑10% Speedup
The authors demonstrate that FP4 training instability stems from structural micro‑scaling errors in the weight‑gradient path rather than insufficient randomness, and show that a deterministic Hadamard rotation restores convergence, delivering a 9‑10% end‑to‑end speedup on native FP4 hardware (AMD MI355X) while incurring only 8‑9% token overhead.
