NVIDIA Unveils Nemotron 3 Ultra: The Largest US Open‑Source LLM Boosting Agent Capabilities
NVIDIA released Nemotron 3 Ultra, a 550 B‑parameter open‑source LLM with 55 B active MoE parameters, hybrid Mamba‑Transformer architecture, 1 M token context, and three core innovations that deliver superior MMLU, code, math scores and up to 5× throughput versus rivals, though weights are not yet public.
NVIDIA announced Nemotron 3 Ultra, a base‑model checkpoint with 550 B total parameters and 55 B active parameters (90 % sparsity) using a Hybrid Mamba‑Transformer MoE architecture and NVFP4 precision. The model is intended for downstream fine‑tuning and RLHF research and does not include instruction tuning or alignment.
Three Core Technologies
LatentMoE : Tokens are first compressed into a low‑rank latent space before MoE routing. This allows roughly four times more experts to be used at the same inference cost, increasing the model’s knowledge capacity.
Multi‑Token Prediction (MTP) : A single forward pass predicts multiple future tokens. During training it improves chain‑of‑thought coherence; during inference it enables speculative decoding without an additional small model.
1 M Token Context : Mamba‑2 layers provide linear‑time complexity, making a one‑million‑token context practical compared with the quadratic cost of pure Transformers. This benefits long‑document processing, multi‑turn agent dialogue, and code‑base understanding.
Benchmark Comparison (GB200 NVL72)
Nemotron 3 Ultra was evaluated against GLM‑4.5‑355B (智谱) and Kimi‑K2‑1026B (月之暗面). Reported scores:
MMLU Pro : 79.0 vs 65.6 (GLM) vs 69.3 (Kimi)
MMLU : 89.1 vs 86.3 vs 88.0
Code : 85.3 vs 76.2 vs 75.3
Math : 85.4 vs 72.1 vs 79.5
Common Sense : 81.0 vs 81.3 vs 81.6 (Kimi highest)
Multilingual : 89.0 vs 83.3 vs 84.2
Peak Throughput : 5× vs 1× vs ~2.5×
Key observations: Nemotron 3 Ultra leads GLM‑4.5 by 13 points on MMLU Pro and surpasses Kimi‑K2 by 10 points; code and math scores exceed 85, indicating strong reasoning ability; throughput is five times higher, a critical production metric; Common Sense is within 0.6 points of the best competitor.
Third‑Party Evaluation
Artificial Analysis assigned an Intelligence Index of 48, the highest among U.S. open‑source weights (Gemma 4 31B 39, Nemotron 3 Super 36, gpt‑oss‑120b 33). The model ranks second globally behind Kimi K2.6 (54). Inference speed measured over 300 tokens/s, whereas DeepSeek and Kimi products typically achieve 50‑100 tokens/s.
Availability
Weights are expected in the first half of 2026. Currently only a usage cookbook and README are hosted on GitHub; the model cannot be downloaded or run yet.
Horizontal Comparison with Competitors
Total Parameters : Nemotron 3 Ultra 550 B; DeepSeek‑V3 685 B; Kimi‑K2 ~1000 B; Llama 3.3 70 B
Active Parameters : 55 B; ~37 B; ~33 B; 70 B (dense)
Context Length : 1 M; 128 K; 128 K; 128 K
Architecture : Mamba + Transformer MoE; Transformer MoE; Transformer MoE; Dense Transformer
Post‑Training : not performed; performed; performed; performed
Inference Speed : >300 t/s; 50‑100 t/s; 50‑100 t/s; relatively fast
Core strengths: 1 M context enabled by Mamba, five‑fold higher throughput, strong base capabilities. Core weaknesses: no post‑training, requires GB200‑class hardware, weights not yet released.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Old Zhang's AI Learning
AI practitioner specializing in large-model evaluation and on-premise deployment, agents, AI programming, Vibe Coding, general AI, and broader tech trends, with daily original technical articles.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
