Artificial Intelligence 8 min read

NVIDIA Unveils Nemotron 3 Ultra: The Largest US Open‑Source LLM Boosting Agent Capabilities

NVIDIA released Nemotron 3 Ultra, a 550 B‑parameter open‑source LLM with 55 B active MoE parameters, hybrid Mamba‑Transformer architecture, 1 M token context, and three core innovations that deliver superior MMLU, code, math scores and up to 5× throughput versus rivals, though weights are not yet public.

Old Zhang's AI Learning

Jun 1, 2026

NVIDIA Unveils Nemotron 3 Ultra: The Largest US Open‑Source LLM Boosting Agent Capabilities

NVIDIA announced Nemotron 3 Ultra, a base‑model checkpoint with 550 B total parameters and 55 B active parameters (90 % sparsity) using a Hybrid Mamba‑Transformer MoE architecture and NVFP4 precision. The model is intended for downstream fine‑tuning and RLHF research and does not include instruction tuning or alignment.

Three Core Technologies

LatentMoE : Tokens are first compressed into a low‑rank latent space before MoE routing. This allows roughly four times more experts to be used at the same inference cost, increasing the model’s knowledge capacity.

Multi‑Token Prediction (MTP) : A single forward pass predicts multiple future tokens. During training it improves chain‑of‑thought coherence; during inference it enables speculative decoding without an additional small model.

1 M Token Context : Mamba‑2 layers provide linear‑time complexity, making a one‑million‑token context practical compared with the quadratic cost of pure Transformers. This benefits long‑document processing, multi‑turn agent dialogue, and code‑base understanding.

Benchmark Comparison (GB200 NVL72)

Nemotron 3 Ultra was evaluated against GLM‑4.5‑355B (智谱) and Kimi‑K2‑1026B (月之暗面). Reported scores:

MMLU Pro : 79.0 vs 65.6 (GLM) vs 69.3 (Kimi)

MMLU : 89.1 vs 86.3 vs 88.0

Code : 85.3 vs 76.2 vs 75.3

Math : 85.4 vs 72.1 vs 79.5

Common Sense : 81.0 vs 81.3 vs 81.6 (Kimi highest)

Multilingual : 89.0 vs 83.3 vs 84.2

Peak Throughput : 5× vs 1× vs ~2.5×

Key observations: Nemotron 3 Ultra leads GLM‑4.5 by 13 points on MMLU Pro and surpasses Kimi‑K2 by 10 points; code and math scores exceed 85, indicating strong reasoning ability; throughput is five times higher, a critical production metric; Common Sense is within 0.6 points of the best competitor.

Third‑Party Evaluation

Artificial Analysis assigned an Intelligence Index of 48, the highest among U.S. open‑source weights (Gemma 4 31B 39, Nemotron 3 Super 36, gpt‑oss‑120b 33). The model ranks second globally behind Kimi K2.6 (54). Inference speed measured over 300 tokens/s, whereas DeepSeek and Kimi products typically achieve 50‑100 tokens/s.

Availability

Weights are expected in the first half of 2026. Currently only a usage cookbook and README are hosted on GitHub; the model cannot be downloaded or run yet.

Horizontal Comparison with Competitors

Total Parameters : Nemotron 3 Ultra 550 B; DeepSeek‑V3 685 B; Kimi‑K2 ~1000 B; Llama 3.3 70 B

Active Parameters : 55 B; ~37 B; ~33 B; 70 B (dense)

Context Length : 1 M; 128 K; 128 K; 128 K

Architecture : Mamba + Transformer MoE; Transformer MoE; Transformer MoE; Dense Transformer

Post‑Training : not performed; performed; performed; performed

Inference Speed : >300 t/s; 50‑100 t/s; 50‑100 t/s; relatively fast

Core strengths: 1 M context enabled by Mamba, five‑fold higher throughput, strong base capabilities. Core weaknesses: no post‑training, requires GB200‑class hardware, weights not yet released.

Nemotron 3 Ultra three core technologies

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Large Language Model Benchmark Open-source AI MoE Mamba Nemotron 3 Ultra

Written by

Old Zhang's AI Learning

AI practitioner specializing in large-model evaluation and on-premise deployment, agents, AI programming, Vibe Coding, general AI, and broader tech trends, with daily original technical articles.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Three Core Technologies

Benchmark Comparison (GB200 NVL72)

Third‑Party Evaluation

Availability

Horizontal Comparison with Competitors

Old Zhang's AI Learning

How this landed with the community

Was this worth your time?

0 Comments

Benchmark Comparison (GB200 NVL72)