How False Sharing Slows Down Multithreaded Java Apps—and How to Fix It
This article explains the hidden performance killer of cache false sharing in multicore Java applications, demonstrates its impact with benchmark code, and shows how padding, the @Contended annotation, and Caffeine's internal structures can eliminate the issue for faster execution.
In high‑concurrency multicore scenarios, cache false sharing is an invisible performance killer. When different threads frequently modify independent variables that reside on the same cache line, the CPU cache‑coherency protocol forces synchronization of the entire line, causing an invalidation storm that dramatically slows overall efficiency.
False Sharing
False sharing is a usage pattern that leads to performance degradation, most common in modern multi‑processor CPU caches. Although the variables are logically independent, their physical proximity in memory causes the cache line to be invalidated whenever any of them is updated, resulting in frequent memory accesses and slower execution.
To illustrate how false sharing occurs in the CPU cache, we first introduce the cache hierarchy.
CPU caches are typically organized into three levels (L1, L2, L3). The cache line, usually 64 bytes (or 128 bytes), is the basic unit of data read from memory. When a long[] array element is loaded, the surrounding seven elements are also fetched into the same cache line.
Core 1 continuously updates variable X, while core 2 updates variable Y. Each modification invalidates the cache line in the other core, forcing a reload and causing a noticeable performance drop—this is the false sharing problem.
public class TestFalseSharing {
static class Pointer {
// two volatile variables to ensure visibility
volatile long x;
volatile long y;
@Override
public String toString() {
return "x=" + x + ", y=" + y;
}
}
@Test
public void testFalseSharing() throws InterruptedException {
Pointer pointer = new Pointer();
long start = System.currentTimeMillis();
Thread t1 = new Thread(() -> {
for (int i = 0; i < 100_000_000; i++) {
pointer.x++;
}
});
Thread t2 = new Thread(() -> {
for (int i = 0; i < 100_000_000; i++) {
pointer.y++;
}
});
t1.start();
t2.start();
t1.join();
t2.join();
System.out.println(System.currentTimeMillis() - start);
System.out.println(pointer);
}
}Running this code takes about 3709 ms. By adding seven padding long fields between x and y, the variables are placed on separate cache lines, reducing the execution time to roughly 473 ms.
public class TestFalseSharing {
static class Pointer {
volatile long x;
long p1, p2, p3, p4, p5, p6, p7; // padding
volatile long y;
@Override
public String toString() {
return "x=" + x + ", y=" + y;
}
}
@Test
public void testFalseSharing() throws InterruptedException {
// ... same test as before ...
}
}Caffeine’s Solution to Cache False Sharing
In the article “The Beauty of Caffeine”, the WriterBuffer data structure is responsible for recording write‑behind tasks. Its inheritance hierarchy is shown below:
The highlighted classes are used to eliminate false sharing. For example, BaseMpscLinkedArrayQueuePad1 defines 120 byte‑sized fields, ensuring that subsequent fields are allocated to different cache lines regardless of whether the line size is 64 bytes or 128 bytes.
abstract class BaseMpscLinkedArrayQueuePad1<E> extends AbstractQueue<E> {
byte p000, p001, p002, p003, p004, p005, p006, p007;
byte p008, p009, p010, p011, p012, p013, p014, p015;
// ... (continues up to p119) ...
byte p116, p117, p118, p119;
}
abstract class BaseMpscLinkedArrayQueueProducerFields<E> extends BaseMpscLinkedArrayQueuePad1<E> {
// Producer index (does not correspond to buffer index)
protected long producerIndex;
}Beyond Caffeine, JDK 1.8 introduced the @Contended annotation, which can also prevent false sharing. The following excerpt from ConcurrentHashMap shows its usage:
public class ConcurrentHashMap<K,V> extends AbstractMap<K,V>
implements ConcurrentMap<K,V>, Serializable {
// ...
@sun.misc.Contended
static final class CounterCell {
volatile long value;
CounterCell(long x) { value = x; }
}
}The @Contended annotation can be applied to classes or fields. When applied, the annotated fields are isolated from others, effectively placing them on separate cache lines. To activate the annotation, the JVM option -XX:-RestrictContended must be supplied.
Further Thoughts on False Sharing
The primary method to avoid false sharing is thorough code inspection, as the issue only arises when different threads modify distinct variables that happen to reside on the same cache line in main memory. Local variables or thread‑local storage are not sources of false sharing. The solution essentially trades additional memory for speed, so it should be used judiciously to avoid excessive memory consumption.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
JD Tech Talk
Official JD Tech public account delivering best practices and technology innovation.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
