Understanding the Physical Structure of Memory and the Root Cause of Memory Alignment
The article explains how memory chips are organized into banks and matrices, why eight consecutive bytes are distributed across eight banks for parallel I/O, and how this hardware design makes 64‑bit (8‑byte) alignment essential for optimal performance.
Memory Physical Structure
Most people are aware that memory alignment improves performance, but the deepest reason lies in the physical construction of memory. A memory module consists of many black memory chips; each chip contains eight banks, as shown in the diagram.
Figure 1: Physical appearance of a memory module
Each chip is built from eight banks. Inside each bank is a two‑dimensional matrix where every element stores one byte (8 bits).
Figure 2: Internal structure of a chip
Figure 3: Internal structure of a bank
Memory Addressing Method
When a program accesses eight consecutive bytes, e.g., 0x0000‑0x0007, one might assume they reside in the first bank, but actually each byte is stored in a different bank. Physically the bytes are not contiguous; the diagram below illustrates the real distribution.
Figure 4: Physical distribution of eight consecutive bytes
The reason is circuit efficiency: the eight banks can operate in parallel. To read 0x0000‑0x0007, each bank supplies one byte simultaneously, producing the full 8‑byte word in a single I/O operation. If the bytes were all in one bank, the reads would have to be serialized, requiring eight separate accesses and slowing performance.
Conclusion
Therefore, the deepest cause of memory alignment is that memory I/O works on 8‑byte (64‑bit) units. For a 64‑bit wide memory (and a 64‑bit CPU), each I/O fetch reads one byte from each of the eight banks and assembles them. Addresses 0‑7 can be fetched in one operation, as can 8‑15, and so on.
If you request a range that is not 8‑byte aligned, such as 0x0001‑0x0008, the memory controller must first read 0x0000‑0x0007, then 0x0008‑0x000F, and combine the results, which incurs extra latency. This hardware limitation explains why misaligned accesses are slower.
Extension 1: Compilers and linkers automatically align variables for developers, but they cannot achieve perfect alignment in every case. Extension 2: Beyond the hardware, the operating system manages CPU caches. A cache line is 64 bytes—eight times the memory I/O unit—so the OS and hardware together avoid wasted I/O cycles.
Refining Core Development Skills
Fei has over 10 years of development experience at Tencent and Sogou. Through this account, he shares his deep insights on performance.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.