How Huawei’s Kunpeng 960 Uses the τ Law and 3‑D Stacking to Defy Moore’s Law
Amid global semiconductor bottlenecks, Huawei’s Kunpeng line shifts from 2‑D planar CPUs to 3‑D stacked architectures, with the 950 introducing a two‑layer design and the upcoming 960 promising a three‑layer stack, 4 GHz clocks and a 54% performance boost, illustrating the τ scaling theory and a new supernode strategy that could reshape domestic and global server markets.
1. Kunpeng 950 – The Two‑Layer Pioneer
While the worldwide semiconductor industry wrestles with EUV process limits and attempts to cram more transistors onto a 2‑D plane, Huawei’s Kunpeng family takes a different route: 3‑D stacking. This transforms the CPU from a flat “pancake” into a multi‑layer “burger,” opening a new performance path despite process constraints.
At the ISCAS2026 forum, HiSilicon Fellow Xia Jing unveiled the first 3‑D‑stacked Kunpeng 950 and hinted at the upcoming 960. The talk referenced the newly published “τ (Taò) scaling theory” paper, which frames the architectural approach.
The 950 is a two‑layer stacked CPU offered in two configurations:
96‑core “big‑core” version – targets high‑performance AI host and database workloads.
192‑core “small‑core” version – optimized for virtualization, containers and big‑data tasks.
Why stack? Traditional 2‑D chips are likened to squeezing people onto a single A4 sheet: wiring becomes longer, heat rises, and performance stalls. By splitting the die into two vertical layers, vertical interconnect replaces long planar wires, transistor density reaches its limit, latency is cut roughly in half, and power consumption drops dramatically.
Key specifications of the 950:
Base clock: 2.6 GHz with a dual‑thread “Lingxi” core design.
96‑core variant excels in single‑thread‑intensive scenarios such as AI inference and databases.
192‑core variant provides high density for virtualized and big‑data workloads.
First Kunpeng CPU with confidential‑computing capabilities, featuring four‑layer isolation for finance and government use cases.
2. Kunpeng 960 – Pushing the Stack to Three Layers
The 960 is positioned as the performance “nuke.” Building on the 950, it adds a third stacked layer and raises the clock to 4 GHz , a 54 % increase over the 2.6 GHz of the 950. This jump is described as a “game‑changing” boost in an era of stagnant process scaling.
Performance expectations:
The high‑performance 96‑core model retains the core count but gains >50 % single‑core speed.
The high‑density version scales to ≥256 cores / 512 threads, effectively delivering the compute power of two to three separate chips and enabling what the author calls “performance freedom” for virtualization and big‑data workloads.
Behind these gains is the τ law: when Moore’s law slows, Huawei abandons further planar scaling and instead folds the architecture vertically (3‑D folding, system stacking, super‑node interconnect). This approach packs more transistors without relying on EUV, achieving a compute leap through architectural innovation.
The three‑layer stack distributes responsibilities across layers—compute, cache, and I/O—connected vertically, delivering nanosecond‑level latency while keeping power in check.
3. Stacking as a Strategic Breakthrough
3‑D stacking is technically daunting: vertical interconnect, thermal management, signal integrity, and yield control are all “hell‑level” challenges. Huawei persisted, moving from the 2‑layer testbed of the 950 to the 3‑layer explosion of the 960, essentially creating its own rules when existing ones proved insufficient.
Industry implications:
Domestic impact: 3‑D stacking sidesteps EUV limitations, giving Chinese server CPUs a new growth path that can keep pace with—or even surpass—global competitors without depending on advanced process nodes.
Broader industry impact: The 950/960 are part of Huawei’s “super‑node + stacking” strategy. CPUs and Ascend NPUs are stacked and linked via the Lingqu bus, folding thousands of chips into a single massive logical chip (TaiShan super‑node). This challenges traditional mainframes and reshapes the paradigm of general‑purpose and AI‑focused compute infrastructure.
Looking ahead, as 3‑D stacking and super‑node interconnect technologies mature, Huawei plans to add more layers and increase core counts—moving from 96‑core to 256‑core and beyond—gradually turning “impossible” into “possible.”
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
