Why Microsoft Is Shifting AI Workloads from GPUs to Its Own Maia Accelerators

Microsoft, after buying massive GPU inventories from Nvidia and AMD, is accelerating its move to custom AI accelerators like Maia to improve cost‑performance in its data centers, even though its first‑generation chips still lag behind industry leaders.

21CTO
21CTO
21CTO
Why Microsoft Is Shifting AI Workloads from GPUs to Its Own Maia Accelerators

Microsoft has purchased large quantities of GPUs from Nvidia and AMD, but its leadership now aims to shift most AI workloads from GPUs to its own custom‑designed accelerators.

Unlike Amazon and Google, which have been building custom CPUs and AI accelerators for years, Microsoft only launched its first Maia AI accelerator at the end of 2023.

The shift is driven by a focus on cost‑performance, the single most important metric for a hyperscale cloud provider.

Microsoft CTO Kevin Scott told CNBC that Nvidia currently offers the best price‑performance, but he is open to any solution that meets demand.

Looking ahead, Scott said Microsoft wants to use its own chips for the majority of data‑center workloads.

When asked if the long‑term plan is to rely primarily on Microsoft‑made chips in data centers, he answered, “Yes, absolutely.”

He added that this decision involves the entire system design, including networking and cooling, to freely make the necessary choices for optimal compute performance.

With its first internal AI accelerator, Maia 100, Microsoft moved OpenAI’s GPT‑3.5 onto its own silicon in 2023, freeing GPU capacity. However, the chip’s BF16 performance (800 teraFLOPS), HBM2e memory (64 GB), and bandwidth (1.8 TB/s) lag far behind competing GPUs from Nvidia and AMD.

Reports indicate that a second‑generation Maia accelerator will be launched next year, promising more competitive compute, memory, and interconnect performance.

Even though the mix of GPUs and AI ASICs in Microsoft’s data centers may evolve, it is unlikely that Nvidia and AMD chips will be completely displaced.

Google and Amazon have deployed tens of thousands of TPUs and Trainium accelerators, primarily to accelerate their own internal workloads and win customers such as Anthropic.

Consequently, large‑scale deployments of Nvidia and AMD GPUs continue on these cloud platforms because many customers still require them.

It is also worth noting that AI accelerators are not Microsoft’s only custom silicon; the company also designs its own CPU, called Cobalt, and a suite of security chips to accelerate encryption and protect key exchanges across its massive data‑center domains.

Author: Luo Yi
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Cloud ComputingGPUMicrosoftAI acceleratorcustom chipMaia
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.