Fundamentals 10 min read

How False Sharing Slows Down Multithreaded Java Apps—and How to Fix It

This article explains the hidden performance killer of cache false sharing in multicore Java applications, demonstrates its impact with benchmark code, and shows how padding, the @Contended annotation, and Caffeine's internal structures can eliminate the issue for faster execution.

JD Tech Talk
JD Tech Talk
JD Tech Talk
How False Sharing Slows Down Multithreaded Java Apps—and How to Fix It

In high‑concurrency multicore scenarios, cache false sharing is an invisible performance killer. When different threads frequently modify independent variables that reside on the same cache line, the CPU cache‑coherency protocol forces synchronization of the entire line, causing an invalidation storm that dramatically slows overall efficiency.

False Sharing

False sharing is a usage pattern that leads to performance degradation, most common in modern multi‑processor CPU caches. Although the variables are logically independent, their physical proximity in memory causes the cache line to be invalidated whenever any of them is updated, resulting in frequent memory accesses and slower execution.

To illustrate how false sharing occurs in the CPU cache, we first introduce the cache hierarchy.

cpu_cache.png
cpu_cache.png

CPU caches are typically organized into three levels (L1, L2, L3). The cache line, usually 64 bytes (or 128 bytes), is the basic unit of data read from memory. When a long[] array element is loaded, the surrounding seven elements are also fetched into the same cache line.

伪共享问题.drawio.png
伪共享问题.drawio.png

Core 1 continuously updates variable X, while core 2 updates variable Y. Each modification invalidates the cache line in the other core, forcing a reload and causing a noticeable performance drop—this is the false sharing problem.

public class TestFalseSharing {
    static class Pointer {
        // two volatile variables to ensure visibility
        volatile long x;
        volatile long y;
        @Override
        public String toString() {
            return "x=" + x + ", y=" + y;
        }
    }
    @Test
    public void testFalseSharing() throws InterruptedException {
        Pointer pointer = new Pointer();
        long start = System.currentTimeMillis();
        Thread t1 = new Thread(() -> {
            for (int i = 0; i < 100_000_000; i++) {
                pointer.x++;
            }
        });
        Thread t2 = new Thread(() -> {
            for (int i = 0; i < 100_000_000; i++) {
                pointer.y++;
            }
        });
        t1.start();
        t2.start();
        t1.join();
        t2.join();
        System.out.println(System.currentTimeMillis() - start);
        System.out.println(pointer);
    }
}

Running this code takes about 3709 ms. By adding seven padding long fields between x and y, the variables are placed on separate cache lines, reducing the execution time to roughly 473 ms.

public class TestFalseSharing {
    static class Pointer {
        volatile long x;
        long p1, p2, p3, p4, p5, p6, p7; // padding
        volatile long y;
        @Override
        public String toString() {
            return "x=" + x + ", y=" + y;
        }
    }
    @Test
    public void testFalseSharing() throws InterruptedException {
        // ... same test as before ...
    }
}

Caffeine’s Solution to Cache False Sharing

In the article “The Beauty of Caffeine”, the WriterBuffer data structure is responsible for recording write‑behind tasks. Its inheritance hierarchy is shown below:

WriteBuffer.drawio.png
WriteBuffer.drawio.png

The highlighted classes are used to eliminate false sharing. For example, BaseMpscLinkedArrayQueuePad1 defines 120 byte‑sized fields, ensuring that subsequent fields are allocated to different cache lines regardless of whether the line size is 64 bytes or 128 bytes.

abstract class BaseMpscLinkedArrayQueuePad1<E> extends AbstractQueue<E> {
    byte p000, p001, p002, p003, p004, p005, p006, p007;
    byte p008, p009, p010, p011, p012, p013, p014, p015;
    // ... (continues up to p119) ...
    byte p116, p117, p118, p119;
}

abstract class BaseMpscLinkedArrayQueueProducerFields<E> extends BaseMpscLinkedArrayQueuePad1<E> {
    // Producer index (does not correspond to buffer index)
    protected long producerIndex;
}

Beyond Caffeine, JDK 1.8 introduced the @Contended annotation, which can also prevent false sharing. The following excerpt from ConcurrentHashMap shows its usage:

public class ConcurrentHashMap<K,V> extends AbstractMap<K,V>
        implements ConcurrentMap<K,V>, Serializable {
    // ...
    @sun.misc.Contended
    static final class CounterCell {
        volatile long value;
        CounterCell(long x) { value = x; }
    }
}

The @Contended annotation can be applied to classes or fields. When applied, the annotated fields are isolated from others, effectively placing them on separate cache lines. To activate the annotation, the JVM option -XX:-RestrictContended must be supplied.

Further Thoughts on False Sharing

The primary method to avoid false sharing is thorough code inspection, as the issue only arises when different threads modify distinct variables that happen to reside on the same cache line in main memory. Local variables or thread‑local storage are not sources of false sharing. The solution essentially trades additional memory for speed, so it should be used judiciously to avoid excessive memory consumption.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Performance tuningCaffeinefalse sharingCache OptimizationJava concurrencyContended
JD Tech Talk
Written by

JD Tech Talk

Official JD Tech public account delivering best practices and technology innovation.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.