Fundamentals 11 min read

Understanding CPU Execution, Architecture, and Multicore/Multithreading

This article explains how a CPU executes programs through the fetch‑decode‑execute cycle, describes instruction sets, registers, pipelines, superscalar and multithreaded designs, and details the cache hierarchy from registers to L3, providing a comprehensive overview of modern processor fundamentals.

Architects' Tech Alliance

Jan 1, 2020

Understanding CPU Execution, Architecture, and Multicore/Multithreading

The execution of a program is essentially a continuous cycle where the CPU fetches an instruction from memory, decodes it to determine its type and operands, and then executes it; this fetch‑decode‑execute loop repeats until the program terminates.

Each CPU implements its own instruction set architecture (ISA); for example, x86 processors cannot run ARM binaries and vice‑versa, which is why Intel/AMD use x86 while most smartphones use ARM.

CPUs provide a variety of registers: general‑purpose registers for temporary data, and special registers such as the Program Counter (PC) that holds the address of the next instruction, the stack pointer that points to the current stack frame, and the Program Status Word (PSW) that contains mode and priority bits. During a context switch, the contents of these registers are saved to memory and restored later.

To move data between memory and registers the CPU offers load/store instructions, and it includes an Arithmetic Logic Unit (ALU) for basic operations like addition, subtraction, and logical functions; multiplication and division are slower because they are derived from more primitive operations.

Modern CPUs improve performance by separating the fetch, decode, and execute stages into independent units, forming a pipeline. Further enhancements include superscalar designs that duplicate these units to allow multiple instructions to be processed in parallel.

Physical CPUs may contain multiple cores, each appearing as an independent processor to the operating system. Hyper‑threading (simultaneous multithreading) creates logical threads per core, sharing core resources; the OS treats each logical thread as a separate CPU, but threads compete for the same execution resources.

Cache hierarchy starts with the fastest, smallest storage—registers (<1 KB). Below registers are L1 caches (split into instruction and data caches), followed by larger but slower L2 and L3 caches. Each core typically has its own L1 cache, while L2 may be private or shared, and L3 is usually shared across cores.

Example code snippets illustrate basic concepts: c=a+b shows a simple arithmetic operation, and the pipeline sequence is represented as 取指->解码->执行.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

architecture Cache multithreading CPU pipeline Instruction Set

Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.