Deep Dive into Java volatile: CPU Cache Architecture, MESI Protocol, JMM and Happens‑Before
This article thoroughly explains the low‑level implementation of Java's volatile keyword by analysing CPU multi‑level cache design, the MESI cache‑coherency protocol, the Java Memory Model, memory barriers, the happens‑before principle, and the impact on singleton patterns and synchronized blocks.
The author begins by noting the lack of time for personal study and uses a recent technical sharing session on the volatile keyword as a chance to review CPU cache hierarchies, the MESI protocol, and the Java Memory Model (JMM).
CPU Multi‑Level Cache Architecture
Modern computers follow the von Neumann architecture with a processor, memory and I/O devices. The most critical components for program execution are the CPU and memory. The cache hierarchy is illustrated as Register > L1 > L2 > L3 > Main Memory > Local Storage > Remote Storage . Each level only communicates with its immediate neighbour, so a read first checks L1, then L2, and so on until the main memory is reached.
Cache Lines and Locality
Cache lines (typically 64 bytes on x86) are the unit of caching. Because of temporal and spatial locality, the CPU loads an entire cache line even when only a single byte is needed, which can cause false sharing in multithreaded programs.
MESI Cache‑Coherency Protocol
The MESI protocol defines four states for a cache line: Modified (M), Exclusive (E), Shared (S) and Invalid (I). The article includes a table describing each state and the required actions. When a core writes to a cache line, it transitions to M, broadcasts invalidation messages, and other cores must invalidate their copies before accessing the line.
Why Java Still Needs volatile
Although MESI guarantees hardware‑level data coherence, volatile addresses a higher‑level problem: the Java Virtual Machine (JVM) must ensure sequential consistency for threads. Without volatile, a write may stay in a write buffer and never become visible to other threads.
JMM Overview
The Java Memory Model separates main memory and per‑thread working memory. All shared variables reside in main memory, while each thread works on its own copy. The JMM defines eight atomic actions (read, load, use, assign, store, write, lock, unlock) that move values between these memories.
Volatile Visibility
When a variable is declared volatile , every write is immediately flushed to main memory and every read fetches the latest value from main memory. Example code demonstrates the difference: public class SharedObject { public boolean flag; // non‑volatile } and public class SharedObject { public volatile boolean flag; // volatile } The latter guarantees that thread A sees the update performed by thread B.
Volatile Does Not Guarantee Atomicity
Using i++ on a volatile int can still lose updates because the operation consists of a read‑modify‑write sequence that is not atomic.
Instruction Reordering and Memory Barriers
Compilers and CPUs may reorder independent instructions to improve performance. The article lists three types of memory barriers (write, read, full) and shows how x86 implements them with lfence , sfence and mfence . The JVM inserts the appropriate barriers (e.g., OrderAccess::storeload() ) when a volatile write occurs.
Happens‑Before Principle
The happens‑before relation defines ordering guarantees such as program order, monitor lock rule, volatile variable rule, thread start/termination, and transitivity. These rules explain why a write to a volatile variable becomes visible to another thread that subsequently reads it.
Synchronized vs. Volatile
Synchronized blocks provide visibility (unlock → lock) but do not prevent instruction reordering inside the block. Experiments show that a synchronized block can still exhibit reordering, so volatile is still required for safe double‑checked locking (DCL) singleton implementations.
Double‑Checked Locking (DCL) Singleton
The article presents the classic DCL code and explains how instruction reordering can produce a half‑initialized object. Even though the synchronized block prevents reordering of the lock acquisition, the write to the instance reference may still be reordered, so the volatile modifier remains necessary.
Comparison of volatile and synchronized
volatile: variable‑level, guarantees visibility, does not guarantee atomicity.
synchronized: block/method‑level, guarantees visibility and atomicity, but may still allow reordering inside the block.
In high‑concurrency scenarios, excessive use of synchronized can degrade performance, whereas volatile offers a lighter‑weight alternative when only visibility and ordering are required.
Conclusion
Modern CPUs use multi‑level caches and may reorder instructions, leading to visibility and ordering problems. Java solves these issues with the volatile keyword, which the HotSpot JVM implements using lock‑prefixed instructions to enforce the necessary memory barriers.
New Oriental Technology
Practical internet development experience, tech sharing, knowledge consolidation, and forward-thinking insights.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.