Artificial Intelligence 12 min read

How Huawei’s CloudMatrix 384 Challenges Nvidia’s AI Supercomputers

Huawei’s CloudMatrix 384, built from 384 Ascend 910C chips and a multi‑to‑multi topology, delivers up to 300 PFLOP BF16 performance—nearly twice that of Nvidia’s GB200 NVL72—while exposing supply‑chain dependencies on foreign fabs, higher power consumption, and a rapid push to scale China’s domestic semiconductor capabilities.

Architects' Tech Alliance
Architects' Tech Alliance
Architects' Tech Alliance
How Huawei’s CloudMatrix 384 Challenges Nvidia’s AI Supercomputers

Huawei has introduced a new AI accelerator and rack‑level architecture called CloudMatrix 384, which uses Ascend 910C chips to create a system that directly competes with Nvidia’s GB200 NVL72 and, in some metrics, surpasses it.

The recent update adds CPU evolution (Intel/AMD and domestic designs), GPU updates (from Fermi to Hopper), memory and storage improvements, fixes known issues, and provides more than 40 pages of PPT material.

CloudMatrix 384 consists of 384 Ascend 910C chips connected via a multi‑to‑multi topology. The system delivers 300 PFLOP of dense BF16 compute—almost double the performance of GB200 NVL72—along with 3.6× total memory capacity and 2.1× memory‑bandwidth advantage, though its power draw is 4.1× higher.

While the Ascend 910C chips are designed in China, their manufacturing still relies heavily on foreign fabs such as Samsung (for HBM), TSMC (for wafers), and equipment from the United States, the Netherlands, and Japan. Huawei secured roughly 13 million HBM stacks from Samsung before the HBM export ban, enough for about 1.6 million Ascend 910C packages.

Domestic fabs like SMIC and CXMT are expanding rapidly; SMIC’s monthly capacity is approaching 50 k wafers and could increase further if yield improves, potentially supplying a significant portion of the required chips.

The full CloudMatrix system spans 16 racks: 12 compute racks each host 32 GPUs, and four vertical‑expansion switch racks provide inter‑rack connectivity using fiber optics. This architecture enables scaling to hundreds of GPUs, a feat that Nvidia’s DGX H100 NVL256 “Ranger” platform could not achieve due to cost, power, and networking complexity.

Overall, Huawei’s solution demonstrates that system‑level engineering—covering networking, optics, and software—can offset chip‑level performance gaps, but the approach faces challenges from higher power consumption and continued dependence on foreign semiconductor supply chains.

Key Specifications

384 Ascend 910C chips

300 PFLOP BF16 compute

3.6× memory capacity, 2.1× bandwidth

Power consumption 4.1× that of GB200 NVL72

Supply‑Chain Highlights

HBM sourced mainly from Samsung (13 million stacks)

Wafer production largely on TSMC’s 7 nm process

Domestic fabs (SMIC, CXMT) expanding capacity and equipment base

Image
Image
High Performance ComputingAI acceleratorHuaweiAscend 910CCloudMatrix 384semiconductor supply chain
Architects' Tech Alliance
Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.