Overview of Huawei Kunpeng 920 Processor Architecture and Subsystems
The article provides a detailed technical overview of Huawei's Kunpeng 920 processor, describing its ARM‑based RISC architecture, chip organization, core and cache hierarchy, security features, IMU management, and the design of its I/O, interrupt, network, SAS, and PCIe subsystems.
Kunpeng Processor Overview
The Kunpeng processor is built on the ARM architecture, employing a RISC (Reduced Instruction Set Computer) design that differs from the CISC instruction sets used by Intel and AMD CPUs.
1. Processor Organization
Chip : A silicon die containing large‑scale integrated circuits.
DIE : The smallest physical unit; Kunpeng 920 contains three DIEs (two compute DIEs and one I/O DIE).
Core : The actual compute unit visible to the operating system.
Cluster : A group of cores; each compute DIE has eight clusters, each cluster contains four cores.
SoC : System on Chip integrating CPU, RoCE NIC, SAS controller, and other peripherals.
2. Kunpeng 920 Chip Architecture
A single SoC includes three DIEs (2 compute, 1 I/O). Each compute DIE hosts 8 clusters, each cluster has 4 cores, resulting in 64 cores total. Each core has private L1/L2 caches, while all cores share an L3 cache. The I/O DIE integrates network and PCIe modules, and the DIEs are interconnected via a high‑speed internal bus.
3. System Security & IMU
The platform supports Secure Boot and ARM TrustZone to ensure a trusted execution environment. The Intelligent Management Unit (IMU) provides data‑center node management, fault preprocessing, security root of trust, energy‑efficiency management, and internal chip monitoring.
4. Additional Subsystems
The processor includes compute, storage, I/O, interrupt, and virtualization subsystems. It uses an AMBA bus to interconnect two CPU DIEs, one I/O DIE, and eight DDR4 channels.
5. I/O Subsystem
The I/O DIE enables on‑chip accelerators such as 100 GbE NICs and SAS controllers, and supports PCIe 4.0 for expansion cards like GPUs and additional NICs.
6. Interrupt Subsystem
Based on the ARM GIC specification, the processor implements generic, message, and local peripheral interrupts (LPI) with support for ITS (Interrupt Translation Service) and MBIGEN technologies, allowing dynamic routing and prioritization of interrupts across CPU cores.
7. Network Subsystem
Comprises Network ICL and RoCE engines; RoCE v2 provides low‑latency, low‑CPU‑utilization RDMA over Ethernet, compatible with both InfiniBand and standard Ethernet networks.
8. SAS Subsystem
Features two X8 SAS 3.0 controllers, supporting direct‑connect and expander configurations, with compatibility down to SAS 2.0/1.0 and SATA 3.0/2.0/1.0, and provides up to eight SAS or SATA drives per controller.
9. PCIe Subsystem
Supports PCIe Gen1‑4, up to 40 lanes across three PCIe cores (16 + 16 + 8 lanes). Each core can function as a Root Port or Endpoint, includes an embedded DMA engine, and offers features such as SR‑IOV, shared virtual memory, CCIX, and peer‑to‑peer communication.
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.