Nvidia’s First Tri‑Mode LLM Boosts Token Throughput 4× and Promises Second‑Second Long‑Text Generation

Nvidia introduces a tri‑mode large language model that can switch among autoregressive, diffusion and self‑speculation decoding, delivering up to four times higher token throughput, achieving state‑of‑the‑art accuracy on benchmarks, and showing significant speed gains on DGX Spark, RTX 6000 Pro and GB200 hardware.

LLMNVIDIASpeculative Decoding

0 likes · 8 min read

Nvidia’s First Tri‑Mode LLM Boosts Token Throughput 4× and Promises Second‑Second Long‑Text Generation

AI Info Trend

Mar 18, 2026 · Industry Insights

Which Large Language Model Leads in Intelligence, Speed, and Cost? 2026 Rankings Revealed

The 2026 Artificial Analysis report ranks the top global large language models by intelligence score, token‑per‑second output speed, and cost per million tokens, highlighting the dominance of Gemini 3.1 Pro Preview and GPT‑5.4 in intelligence, NVIDIA Nemotron 3 Super in speed, and DeepSeek V3.2 and gpt‑oss‑120B as the most cost‑effective options.

AI model rankingToken throughputcost efficiency

0 likes · 8 min read

Which Large Language Model Leads in Intelligence, Speed, and Cost? 2026 Rankings Revealed