Fundamentals 10 min read

Overview of Huawei Kunpeng 920 Processor Architecture and Subsystems

The article provides a detailed technical overview of Huawei's Kunpeng 920 processor, describing its ARM‑based RISC architecture, chip organization, core and cache hierarchy, security features, IMU management, and the design of its I/O, interrupt, network, SAS, and PCIe subsystems.

Architects' Tech Alliance
Architects' Tech Alliance
Architects' Tech Alliance
Overview of Huawei Kunpeng 920 Processor Architecture and Subsystems

Kunpeng Processor Overview

The Kunpeng processor is built on the ARM architecture, employing a RISC (Reduced Instruction Set Computer) design that differs from the CISC instruction sets used by Intel and AMD CPUs.

1. Processor Organization

Chip : A silicon die containing large‑scale integrated circuits.

DIE : The smallest physical unit; Kunpeng 920 contains three DIEs (two compute DIEs and one I/O DIE).

Core : The actual compute unit visible to the operating system.

Cluster : A group of cores; each compute DIE has eight clusters, each cluster contains four cores.

SoC : System on Chip integrating CPU, RoCE NIC, SAS controller, and other peripherals.

2. Kunpeng 920 Chip Architecture

A single SoC includes three DIEs (2 compute, 1 I/O). Each compute DIE hosts 8 clusters, each cluster has 4 cores, resulting in 64 cores total. Each core has private L1/L2 caches, while all cores share an L3 cache. The I/O DIE integrates network and PCIe modules, and the DIEs are interconnected via a high‑speed internal bus.

3. System Security & IMU

The platform supports Secure Boot and ARM TrustZone to ensure a trusted execution environment. The Intelligent Management Unit (IMU) provides data‑center node management, fault preprocessing, security root of trust, energy‑efficiency management, and internal chip monitoring.

4. Additional Subsystems

The processor includes compute, storage, I/O, interrupt, and virtualization subsystems. It uses an AMBA bus to interconnect two CPU DIEs, one I/O DIE, and eight DDR4 channels.

5. I/O Subsystem

The I/O DIE enables on‑chip accelerators such as 100 GbE NICs and SAS controllers, and supports PCIe 4.0 for expansion cards like GPUs and additional NICs.

6. Interrupt Subsystem

Based on the ARM GIC specification, the processor implements generic, message, and local peripheral interrupts (LPI) with support for ITS (Interrupt Translation Service) and MBIGEN technologies, allowing dynamic routing and prioritization of interrupts across CPU cores.

7. Network Subsystem

Comprises Network ICL and RoCE engines; RoCE v2 provides low‑latency, low‑CPU‑utilization RDMA over Ethernet, compatible with both InfiniBand and standard Ethernet networks.

8. SAS Subsystem

Features two X8 SAS 3.0 controllers, supporting direct‑connect and expander configurations, with compatibility down to SAS 2.0/1.0 and SATA 3.0/2.0/1.0, and provides up to eight SAS or SATA drives per controller.

9. PCIe Subsystem

Supports PCIe Gen1‑4, up to 40 lanes across three PCIe cores (16 + 16 + 8 lanes). Each core can function as a Root Port or Endpoint, includes an embedded DMA engine, and offers features such as SR‑IOV, shared virtual memory, CCIX, and peer‑to‑peer communication.

High Performance ComputingARMRISCSOCKunpengprocessor architecture
Architects' Tech Alliance
Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.