Fundamentals 11 min read

Understanding CPU Execution, Architecture, and Multicore/Multithreading

This article explains how a CPU executes programs through the fetch‑decode‑execute cycle, describes instruction sets, registers, pipelines, superscalar and multithreaded designs, and details the cache hierarchy from registers to L3, providing a comprehensive overview of modern processor fundamentals.

Architects' Tech Alliance
Architects' Tech Alliance
Architects' Tech Alliance
Understanding CPU Execution, Architecture, and Multicore/Multithreading

The execution of a program is essentially a continuous cycle where the CPU fetches an instruction from memory, decodes it to determine its type and operands, and then executes it; this fetch‑decode‑execute loop repeats until the program terminates.

Each CPU implements its own instruction set architecture (ISA); for example, x86 processors cannot run ARM binaries and vice‑versa, which is why Intel/AMD use x86 while most smartphones use ARM.

CPUs provide a variety of registers: general‑purpose registers for temporary data, and special registers such as the Program Counter (PC) that holds the address of the next instruction, the stack pointer that points to the current stack frame, and the Program Status Word (PSW) that contains mode and priority bits. During a context switch, the contents of these registers are saved to memory and restored later.

To move data between memory and registers the CPU offers load/store instructions, and it includes an Arithmetic Logic Unit (ALU) for basic operations like addition, subtraction, and logical functions; multiplication and division are slower because they are derived from more primitive operations.

Modern CPUs improve performance by separating the fetch, decode, and execute stages into independent units, forming a pipeline. Further enhancements include superscalar designs that duplicate these units to allow multiple instructions to be processed in parallel.

Physical CPUs may contain multiple cores, each appearing as an independent processor to the operating system. Hyper‑threading (simultaneous multithreading) creates logical threads per core, sharing core resources; the OS treats each logical thread as a separate CPU, but threads compete for the same execution resources.

Cache hierarchy starts with the fastest, smallest storage—registers (<1 KB). Below registers are L1 caches (split into instruction and data caches), followed by larger but slower L2 and L3 caches. Each core typically has its own L1 cache, while L2 may be private or shared, and L3 is usually shared across cores.

Example code snippets illustrate basic concepts: c=a+b shows a simple arithmetic operation, and the pipeline sequence is represented as 取指->解码->执行 .

architectureCacheMultithreadingCPUpipelineInstruction set
Architects' Tech Alliance
Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.