Nvidia Vera CPU Smashes Intel and AMD x86 Titans in AI Workloads
Nvidia's Vera, an 88‑core custom ARM CPU designed for AI agents, delivers up to 55% higher overall performance than Intel Xeon 6980P, 10% over AMD EPYC 9575F and 63% over Nvidia Grace, while offering 1.2 TB/s LPDDR5X bandwidth, 500 W power envelope and a single‑chip design that could reshape the server CPU market.
Why Nvidia Built a CPU for AI Agents
Large‑scale generative models are evolving from simple training jobs into complex AI agents that require heavy logical reasoning, tool invocation, sandbox execution and long‑context management. GPUs excel at tensor math but struggle with these control‑flow‑heavy tasks, making CPUs the performance bottleneck. Nvidia’s existing GPU business does not address this gap, so it introduced Vera, its first fully custom ARM‑based CPU, purpose‑built to handle AI‑agent workloads.
Key Hardware Features
Custom Olympus cores: 88 cores built on a self‑designed micro‑architecture, compatible with Armv9.2 and supporting FP8 precision, delivering a per‑core performance boost of roughly 50% over public‑spec ARM designs.
SMT (Simultaneous Multithreading): Each core runs two threads, yielding 176 hardware threads and roughly doubling dense‑task throughput.
Memory bandwidth: Integrated LPDDR5X provides 1.2 TB/s bandwidth, more than ten times the bandwidth of typical x86 server CPUs.
Single‑chip design: All 88 cores are placed on a monolithic die with a 164 MB shared L3 cache plus 2 MB L2 per core, eliminating inter‑chip latency found in chiplet solutions.
Power envelope: Peak TDP is 450 W for the CPU plus 50 W for memory, totaling about 500 W, comparable to dual‑socket x86 platforms but delivering roughly double the performance per watt.
Benchmark Results (Phoronix)
Phoronix tested Vera against AMD EPYC 9575F (64 cores @ 5 GHz, Zen 5), Intel Xeon 6980P (128 cores, Granite Rapids) and Nvidia’s previous Grace CPU. The geometric‑mean score shows:
Vera is **10 % faster** than EPYC 9575F despite the latter’s high clock speed.
Vera outperforms Xeon 6980P by **55 %**, meaning a single 88‑core Vera beats a dual‑socket 128‑core Xeon.
Vera surpasses Grace by **63 %**, representing a generational leap.
Task‑specific findings include:
Code compilation: Large codebases such as the Godot engine and Node.js compile twice as fast as on Grace and match the single‑core performance of AMD’s 5 GHz flagship.
Memory bandwidth (Stream test): LPDDR5X’s 1.2 TB/s beats DDR5 on x86 platforms; 7‑Zip decompression is 20 % faster per core.
4K 10‑bit AV1 encoding: Outperforms AMD Zen 5 CPUs and significantly exceeds Intel flagship speeds.
Python/Java workloads: Python matches AMD high‑frequency cores, while Java workloads surpass Intel.
Database (ClickHouse) and compression (Zstd): Vera delivers the best observed performance, with compression twice as fast as Grace and noticeably faster than Intel.
Software Stack and Ecosystem
Vera benefits from early software support: mainstream ARM64 Linux distributions (Ubuntu, Fedora) run out‑of‑the‑box, and GCC 16.1+ and Clang 21 already generate optimized code for the Olympus cores. Nvidia also integrates Vera with its broader AI stack—Rubin GPUs, BlueField 4 DPUs and MGX rack architecture—forming a tightly coupled “compute‑storage‑network” AI factory.
Market Implications
The arrival of Vera challenges the long‑standing x86 duopoly. Intel’s server share has slipped from 64.4 % to 54.9 % amid yield issues, while AMD’s dominance is threatened as Vera demonstrates clear AI‑workload advantages before the next EPYC Venice generation. ARM‑based server CPUs have grown from 11.5 % to 17.7 % market share, and Vera’s performance could accelerate this trend, creating a three‑way competitive landscape.
Limitations and Outlook
Vera is currently in pre‑production; power‑efficiency testing is limited, pricing is undisclosed, and volume availability is uncertain. Nevertheless, its benchmark‑driven performance proves that ARM server CPUs can compete head‑to‑head with flagship x86 designs, and Nvidia’s roadmap suggests a shift from a GPU‑centric company to a full‑stack AI infrastructure leader.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
