Hard‑Core Cloud Foundations Power Agentic AI: Highlights from re:Invent 2025 Peter & Dave Keynote
At re:Invent 2025, AWS executives Peter DeSantis and Dave Brown detailed a series of hardware and service innovations—including Graviton5, Trainium3/4, Lambda Managed Instances, Project Mantle, and S3 Vectors—showcasing how security, availability, elasticity, cost, and agility are becoming even more critical for the AI era, with concrete performance benchmarks from customers such as Airbnb, Apple, and Twelve Labs.
Keynote Overview
On December 5, 2024 (Beijing time), AWS senior vice presidents Peter DeSantis and Dave Brown delivered a technical keynote titled “Infrastructure Innovation” at re:Invent 2025, outlining the core value of cloud infrastructure for the AI era.
Core Infrastructure Themes
The speakers emphasized that AI is reshaping application development, but the fundamental attributes of cloud computing—security, availability, elasticity, cost efficiency, and agility—are becoming even more essential. Security must be a top priority because AI also empowers attackers; availability must withstand unprecedented AI workloads; elasticity must match the scale of AI services; cost control is critical given the high expense of AI training and inference; and agility is required to rapidly start, optimize, and adjust AI workloads.
Hardware Innovations
Amazon Graviton5 introduces a 192‑core processor with unified fast memory access, a 5× larger L3 cache, and a fan‑power reduction of 33 % thanks to a direct‑silicon cooling design. The M9g instance built on Graviton5 delivers 25 % higher performance than the previous M8g, offering the best price‑performance in EC2.
Early customer benchmarks demonstrate the impact: Airbnb achieved a 25 % performance uplift, Atlassian reduced latency by 20 %, Honeycomb saw a 36 % per‑core performance increase, and SAP HANA experienced a 60 % boost in OLTP query performance.
Apple Swift on Graviton – Apple’s cloud platform team rewrote core services in Swift and migrated them to Graviton, resulting in a 40 % performance improvement and a 30 % cost reduction. Apple also open‑sourced Swift and collaborated with AWS to provide the first official Swift toolchain for Amazon Linux.
Serverless Evolution
Dave Brown recounted the origin of Amazon Lambda in 2013, when a small team sought to let developers submit code without managing servers. The service grew from an image‑thumbnailing need in the S3 team to a core serverless offering that still runs on EC2 instances, giving customers control over instance type and hardware while Lambda manages configuration, caching, availability, and scaling. This hybrid model opens serverless to workloads such as video processing, ML preprocessing, and high‑throughput analytics.
Inference Engine – Project Mantle
Project Mantle is a purpose‑built inference engine that processes requests in four stages—tokenization, pre‑fill, decoding, and detokenization—each with distinct resource profiles (CPU‑bound, GPU‑bound, memory‑bandwidth‑bound, latency‑sensitive). The system exposes three priority channels (Priority, Standard, Flex) that isolate customer queues, ensuring one customer’s traffic spikes do not affect others. A journal system based on DynamoDB and S3 captures request state for fault‑tolerant recovery, and the scheduler can pause long‑running jobs during traffic spikes and resume them later.
Vector Search and Multimodal Embeddings
Peter DeSantis explained that vectors enable computers to reason about physical attributes, expressions, and relationships similarly to the human brain, using high‑dimensional spaces (often >3,000 dimensions). AWS launched Amazon Nova, a multimodal embedding model supporting text, documents, images, video, and audio, and integrated vector capabilities across all data services.
Amazon S3 Vectors stores billions of embedding vectors directly in S3 buckets, delivering sub‑100 ms query latency at massive scale. Customer case: Twelve Labs uses S3 Vectors to power its Marengo and Pegasus models, processing millions of video hours without data migration, dramatically improving unit economics. Arc XP leverages the same embeddings to quickly locate relevant video segments for news story creation.
AI Accelerators – Trainium
Trainium3 powers the Amazon EC2 Trn3 UltraServers, featuring 144 Trainium3 chips across two racks to form a single AI supercomputer delivering 360 PFLOPS of FP8 compute—4.4× the performance of Trn2 UltraServers. The servers provide 20 TB of high‑bandwidth memory with 700 TB/s bandwidth (3.9× previous generation) and achieve more than five times the token‑per‑megawatt output of Trainium2 on GPT‑OSS‑120B.
System‑level innovations include the first integration of Trainium, Graviton, and Nitro chips on a single board, robot‑ready modular components, a dedicated neuron switch for full‑duplex bandwidth and ultra‑low latency, and Elastic Fabric Adapter enabling direct memory sharing among thousands of Trainium servers.
Micro‑architectural optimizations—such as micro‑scaling, accelerated Softmax, tensor dereferencing, background transposition, traffic shaping, memory‑add‑write, and memory‑scatter—are not listed in official specs but significantly improve real‑world workloads.
Roadmap: Trainium4 is under development and promises 6× the FP4 compute performance, 4× memory bandwidth, and 2× HBM capacity compared with Trainium3, securing AWS’s leadership in AI chips.
Developer Tools
Upcoming releases include Nki , a full‑stack open‑source design slated for Q1 2026 that combines matrix‑operation simplicity with instruction‑level hardware access; Neuron Profiler , a hardware‑based performance analyzer that runs without impacting production code; Neuron Explorer , an interactive UI that visualizes profiling data, auto‑detects bottlenecks, and suggests optimizations; and native PyTorch support for Trainium, expected early next year, allowing a simple .to("neuron") call to run models on Trainium.
Conclusion
Peter DeSantis concluded that AI makes the foundational properties of cloud infrastructure more important than ever. The continuous investment from Amazon Nitro to Graviton to Trainium is not only solving past technical pain points but also preparing the platform for the upcoming Agentic AI era. The announced achievements demonstrate AWS’s dominant position in cloud infrastructure and its commitment to enabling limitless AI possibilities.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Amazon Cloud Developers
Official technical community of Amazon Cloud. Shares practical AI/ML, big data, database, modern app development, IoT content, offers comprehensive learning resources, hosts regular developer events, and continuously empowers developers.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
