Machine Heart
May 22, 2026 · Artificial Intelligence
Nvidia’s First Tri‑Mode LLM Boosts Token Throughput 4× and Promises Second‑Second Long‑Text Generation
Nvidia introduces a tri‑mode large language model that can switch among autoregressive, diffusion and self‑speculation decoding, delivering up to four times higher token throughput, achieving state‑of‑the‑art accuracy on benchmarks, and showing significant speed gains on DGX Spark, RTX 6000 Pro and GB200 hardware.
LLMNVIDIASpeculative Decoding
0 likes · 8 min read
