Tagged articles
2 articles
Page 1 of 1
Machine Heart
Machine Heart
May 27, 2026 · Artificial Intelligence

AMD Paper Finds FP4 Training Instability Is Not Due to Randomness, 9‑10% Speedup

The authors demonstrate that FP4 training instability stems from structural micro‑scaling errors in the weight‑gradient path rather than insufficient randomness, and show that a deterministic Hadamard rotation restores convergence, delivering a 9‑10% end‑to‑end speedup on native FP4 hardware (AMD MI355X) while incurring only 8‑9% token overhead.

Deterministic HadamardFP4MXFP4
0 likes · 10 min read
AMD Paper Finds FP4 Training Instability Is Not Due to Randomness, 9‑10% Speedup
Data Party THU
Data Party THU
Sep 4, 2025 · Artificial Intelligence

How MXFP4 Quantization Lets a 1200‑Billion‑Parameter LLM Run on a Single 80GB GPU

This article analyzes the memory bottleneck of massive language models, explains the mathematical modeling of memory requirements, evaluates traditional sharding limits, and details how GPT‑OSS’s MXFP4 quantization combined with Mixture‑of‑Experts reduces memory, bandwidth, and compute demands enough to fit a 1200‑billion‑parameter model onto an 80 GB GPU with minimal accuracy loss.

FP4LLMMXFP4
0 likes · 11 min read
How MXFP4 Quantization Lets a 1200‑Billion‑Parameter LLM Run on a Single 80GB GPU