Fundamentals 12 min read

Comprehensive Overview of CPU Architecture, Components, and Operation

The article provides a comprehensive overview of CPU architecture, covering the distinction between x86 and non‑x86 processors, packaging types, internal components such as the IHS, TIM, die and interposer, the execution pipeline of a Core i7, cache hierarchy, performance metrics, and common cache replacement algorithms.

Architects' Tech Alliance
Architects' Tech Alliance
Architects' Tech Alliance
Comprehensive Overview of CPU Architecture, Components, and Operation

CPU market can be divided into x86 series and non‑x86 series. The x86 series is produced only by Intel, AMD, and VIA, and its CPUs are mutually compatible at the operating‑system level, covering more than 90% of the desktop computer market. Non‑x86 CPUs are produced by companies and institutions such as IBM, Sun, HP, ARM, MIPS, Hitachi, Samsung, Hyundai, and the Chinese Academy of Sciences. Non‑x86 CPUs are mainly used in large servers and embedded systems, are largely incompatible with each other, and occupy a very small share of the desktop market.

Although Intel and AMD CPUs are comparable in performance and software compatibility, their supporting hardware platforms are not fully compatible; for example, they require different motherboards. Improvements in manufacturing process and hardware bug fixes drive stepping upgrades. Generally, a newer stepping CPU has stronger over‑clocking ability and slightly lower heat dissipation. If two CPUs have the same model but different steppings, the newer stepping usually improves over‑clocking capability.

Most CPUs use LGA or FC‑PGA packaging. FC‑PGA packages the CPU core on a substrate, shortening interconnects and facilitating cooling. LGA uses a pin‑less contact form. A CPU consists of a semiconductor silicon die, a substrate, pins or contact points, thermal interface material, and a metal case.

(1) Integrated Heat Spreader (IHS). The metal case of the CPU is plated with nickel‑copper, protecting the core from physical damage. Its surface is very smooth, which is beneficial for good contact with the heatsink.

(2) Thermal Interface Material (TIM). Between the metal case and the composite ceramic inside, a layer of thermal material—usually thermal paste—is filled. It has excellent insulation and thermal conductivity, transferring heat from the CPU core to the metal case.

(3) CPU core (die). The core is a thin silicon chip, typically about 12 mm × 12 mm × 1 mm. Modern CPUs contain multiple cores (2, 4, 6, or 8). An 8‑core Intel Xeon integrates up to 2.4 billion transistors.

(4) Interposer layer. This layer sits between the core and the substrate, serving three purposes: routing the tiny signal lines from the core to the pins, protecting the fragile core, and fixing the core onto the substrate. It is made of composite material with good insulation and thermal properties, using photolithography to connect directly to the core’s circuitry, and solder balls to connect to the substrate.

(5) Substrate. Surrounding the metal case, the substrate connects the interposer to the pins and implements circuits that prevent high‑frequency signals from the core from interfering with the motherboard.

(6) Resistors and capacitors. Located in the middle of the substrate’s bottom, they are used to eliminate interference from the CPU to external circuits and to match impedance with the motherboard. Their arrangement varies across different CPU series.

(7) Pins. Gold‑plated contact points below the substrate form the channel through which the CPU connects to external circuitry.

In a Core i7 CPU, the core is divided into the core part (execution pipeline, L1/L2 caches) and the uncore part (L3 cache, integrated memory controller, QuickPath Interconnect, power and clock control units).

CPU operation proceeds roughly as follows: before execution, instructions and data are first loaded into memory or the CPU’s caches (L1/L2/L3). This process is called caching. The CPU fetches an instruction from the cache or memory according to the address indicated by the program counter (PC); then it performs branch prediction, known as the fetch stage (IF).

After fetching, the CPU decodes the instruction into micro‑operations (μOP) in the decode stage (DEC). Decoding yields the opcode and operand addresses, after which the operands are fetched. The CPU then allocates required resources (registers, ALUs, etc.) in the instruction control or dispatch stage (ICU).

Once operands are retrieved, the execution unit (e.g., ALU) performs the operation as directed by the opcode (execution stage, EXE). After execution, the result is written back to the CPU’s register file and, if necessary, to the cache or memory in the retire/write‑back stage (Retire).

A Core i7 CPU contains dozens of system units. From an architectural viewpoint, its internal structure mainly includes cache units, fetch unit (IF), decode unit (DEC), control unit (ICU), execution unit (EXE), and retire unit (RU).

Each Core i7 core includes five 64‑bit integer ALUs and three 128‑bit floating‑point units (FPUs). Theoretically, per clock cycle a core can: fetch 128 bits of instruction or data; decode four x86 instructions (one complex, three simple); issue seven μops; reorder/rename four μops; dispatch six μops to execution units; execute either 320 bits of integer operations or 384 bits of floating‑point operations; and retire four 128‑bit μops. At 3.2 GHz the peak floating‑point performance is 51 GFLOPS (double‑precision) or 102 GFLOPS (single‑precision).

The cache hit rate is the probability that the required data is found in the cache when the CPU accesses the storage system; a higher hit rate (closer to 1) is better.

When the CPU accesses memory, it first checks the cache. Because not all required data resides in the cache, a miss may occur, forcing the CPU to access main memory, which incurs additional latency. Properly sizing the cache relative to memory can keep the hit rate high.

To maintain a high cache hit rate, common replacement algorithms such as Least Recently Used (LRU) are employed. Modern CPUs achieve cache hit rates above 95%.

Statement: Thanks to the original author for their hard work. All reproduced articles will be properly credited; please contact us if there are copyright issues.

Recommended Reading – For more architecture‑related technical summaries, refer to the "Architect’s Complete Technical Materials Pack" ebook collection (41 volumes).

Purchase the full "Architect’s Technical Materials Pack" now for ¥249 (original price ¥439). The bundle includes comprehensive server fundamentals and storage system fundamentals in PDF and PPT formats, with free updates for the entire collection.

Warm Tip: Scan the QR code, follow the public account, and click "Read Original" to obtain the full details of the "Architect’s Technical Materials Pack".

*The above text contains advertising content

performanceCacheCPUComputer ArchitectureMicroarchitecture
Architects' Tech Alliance
Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.