How False Sharing Slows Java Programs and How to Eliminate It
This article explains the concept of false sharing in CPU caches, demonstrates its performance impact with Java code, analyzes the results, and shows how to prevent it using the @Contended annotation and appropriate JVM flags.
CPU Cache and False Sharing
CPU caches consist of L1‑L3 levels, each organized into cache lines (default size 128 bits). When a core writes to a cache line, the coherence protocol invalidates the same line in other cores, forcing them to fetch fresh data from main memory.
Definition and Causes of False Sharing
False sharing occurs when different threads write to variables that reside on the same cache line but are not logically related. Because the line must be invalidated on each write, other threads repeatedly miss the cache, dramatically reducing performance.
Code Demonstration
<code>public class FalseShared {
private static final int[][] arr2continuous = new int[1024][1024];
private static final int[][] arr2notcontinuos = new int[1024][1024];
static {
for (int i = 0; i < 1024; i++) {
arr2continuous[0][i] = i * 2 + 1;
arr2notcontinuos[i][0] = i * 2 + 1;
}
}
private static void readByContinuous() {
long start = System.currentTimeMillis();
for (int row = 0; row < arr2continuous.length; row++) {
for (int col = 0; col < arr2continuous.length; col++) {
long temp = arr2continuous[row][col];
}
}
long end = System.currentTimeMillis();
System.out.println("read arr by continuous with time : " + (end - start));
}
private static void readByNotContinuous() {
long start = System.currentTimeMillis();
for (int row = 0; row < arr2notcontinuos.length; row++) {
for (int col = 0; col < arr2notcontinuos.length; col++) {
long temp = arr2notcontinuos[col][row];
}
}
long end = System.currentTimeMillis();
System.out.println("read arr by not continuous with time : " + (end - start));
}
public static void main(String[] args) throws Exception {
new Thread(() -> readByContinuous()).start();
new Thread(() -> readByNotContinuous()).start();
}
}
</code>Running the program repeatedly shows the continuous‑array version completing roughly twice as fast as the non‑continuous version, confirming that data layout affects cache efficiency.
Result Analysis
The array stored row‑wise (continuous) benefits from cache line prefetching.
The column‑wise (non‑continuous) layout forces the CPU to load a new cache line for each element, causing more misses.
Storing data contiguously improves execution speed by reducing cache‑miss penalties.
Non‑contiguous data ends up in different cache lines, leading to false sharing when multiple threads access them.
Java Solutions to False Sharing
JVM provides the @sun.misc.Contended annotation to separate fields into distinct cache lines. The annotation can be applied to individual fields or whole classes, and it requires the VM flag -XX:-RestrictContended to be enabled. The cache‑line width can be adjusted with -XX:ContendedPaddingWidth=256 .
<code>// Example of using @Contended on a field
public class ContendedTest1 {
@Contended
private Object contendedField1;
private Object plainField1;
private Object plainField2;
private Object plainField3;
private Object plainField4;
}
// JVM output shows contendedField1 placed far from the other fields (offset 156 vs 12‑24).
</code>Applying the annotation forces the JVM to pad the contended field so that it resides on a separate cache line, eliminating false sharing.
Usage Guidelines
Add @Contended to fields or classes that are frequently written by different threads.
Run the application with -XX:-RestrictContended (and optionally -XX:ContendedPaddingWidth=256 ) to activate the padding.
Consider separating hot and cold data based on access patterns to maximize cache utilization.
Xiaokun's Architecture Exploration Notes
10 years of backend architecture design | AI engineering infrastructure, storage architecture design, and performance optimization | Former senior developer at NetEase, Douyu, Inke, etc.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.