May 22, 2026 · Artificial Intelligence

Nvidia’s First Tri‑Mode LLM Boosts Token Throughput 4× and Promises Second‑Second Long‑Text Generation

Nvidia introduces a tri‑mode large language model that can switch among autoregressive, diffusion and self‑speculation decoding, delivering up to four times higher token throughput, achieving state‑of‑the‑art accuracy on benchmarks, and showing significant speed gains on DGX Spark, RTX 6000 Pro and GB200 hardware.

LLMNVIDIASpeculative Decoding

0 likes · 8 min read

Nvidia’s First Tri‑Mode LLM Boosts Token Throughput 4× and Promises Second‑Second Long‑Text Generation