Cloud Computing 18 min read

In‑Depth Analysis of AWS Graviton 3: Architecture, Performance, and Comparison with x86 Competitors

The article provides a comprehensive technical review of AWS’s Graviton 3 ARM server CPU, detailing its SVE support, branch prediction, front‑end, renamer, execution units, cache hierarchy, and performance comparisons with Neoverse N1, Intel Ice Lake, and AMD Zen 3, while discussing cloud‑centric design trade‑offs.

Architects' Tech Alliance

Jun 10, 2022

In‑Depth Analysis of AWS Graviton 3: Architecture, Performance, and Comparison with x86 Competitors

In late May 2022 AWS announced Graviton 3, the first widely available server‑grade ARM CPU to implement the Scalable Vector Extension (SVE) instruction set, positioning it as the successor to Neoverse N1 and Graviton 2.

The branch predictor in Graviton 3 shows a major improvement over N1, with a larger micro‑BTB and the ability to recognize patterns up to 16 long and 512 branches, matching or exceeding the capabilities of Intel Ice Lake SP and AMD Zen 3.

Graviton 3’s front‑end features a four‑wide decoder and a 3 K entry micro‑op cache, similar to Intel and AMD designs, and adds advanced jump‑fusion and NOP‑fusion techniques that increase instruction throughput.

The renamer appears to be six‑wide, giving Graviton 3 per‑cycle rename capacity comparable to Zen 3, and it can break certain register‑to‑register dependencies, though not as aggressively as Intel’s Sunny Cove.

Reorder buffer (ROB) tests suggest a 512‑entry capacity, though effective capacity may be 256 entries when accounting for fused NOPs; the register file includes 125 256‑bit vector registers.

Execution units are robust, with four integer ALUs, three memory pipelines, and two 256‑bit SVE FP pipelines capable of two operations per cycle, delivering floating‑point throughput comparable to AVX‑enabled x86 cores.

Cache hierarchy improvements include a 64 KB L1‑D, unchanged L2 size with two‑cycle latency reduction, and a much lower‑latency L3 compared to Ampere Altra; memory latency benefits from DDR5 but bandwidth is superior to the x86 rivals.

SVE support is a notable differentiator, though software ecosystems are still catching up; most existing workloads will not yet exploit SVE, and the advantage may be limited for several years.

From AWS’s perspective, Graviton 3 is built on a 5 nm process with a modest 2.6 GHz clock to maximize density and power efficiency, allowing three chips per node and lower cost per core while delivering a clear performance uplift over Graviton 2.

Overall, Graviton 3 narrows the performance gap with Intel’s Ice Lake and AMD’s Zen 3 in branch prediction, reorder capacity, and execution resources, but it still lags behind in raw clock speed and some cache bandwidth metrics.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

performance analysis CPU architecture SVE

Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.