Fundamentals 27 min read

Understanding Memory Consistency Models: From Sequential Consistency to x86‑TSO and Weak Memory Models

This article explains how modern multiprocessor hardware and compiler optimizations affect program behavior, introduces memory consistency models such as sequential consistency, x86‑TSO, and ARM/POWER weak models, demonstrates their differences with litmus tests, and discusses the DRF‑SC guarantee for data‑race‑free programs.

Cognitive Technology Team
Cognitive Technology Team
Cognitive Technology Team
Understanding Memory Consistency Models: From Sequential Consistency to x86‑TSO and Weak Memory Models

Long ago, when programs were single‑threaded, the most effective way to speed them up was to let the hardware run faster; any optimization that did not change observable behavior was considered valid. With the advent of multicore processors, operating systems expose parallelism via threads, creating new challenges for language designers, compiler writers, and programmers.

Many optimizations invisible in single‑threaded code become visible in multithreaded programs, forcing us to decide which optimizations remain valid and which programs must be deemed invalid.

A simple C‑like example is presented where two threads write and read shared variables. Whether the program can output 0 depends on the hardware and compiler: on x86 it always prints 1, while on ARM or POWER it may print 0, and compiler reordering can also affect the result.

// thread 1           // thread 2
x = 1;               while (done == 0) { /* loop */ }
 done = 1;            print(x);

The discussion then moves to memory consistency models, which define the guarantees hardware provides to programmers. Initially, hardware memory models were defined for assembly programmers; later, language memory models (e.g., Java, C++) were added, making the definitions more complex.

Section 2: Sequential Consistency

Leslie Lamport introduced sequential consistency in 1979, requiring that the result of any execution be equivalent to some interleaving of operations that respects each thread’s program order. Sequential consistency is considered the ideal model for programmers.

Litmus tests such as the message‑passing test are used to distinguish models. Under sequential consistency, the outcome r1=1, r2=0 is impossible.

// thread 1           // thread 2
x = 1;               r1 = y
 y = 1;               r2 = x

The model can be visualized as a single shared memory that processes one read or write at a time, imposing a total order on all memory accesses.

Section 3: x86 Total Store Order (x86‑TSO)

x86 implements a Total Store Order (TSO) where each processor has a write‑buffer queue. Writes become visible to other processors only after they reach the shared memory, but a processor can see its own writes immediately.

Under TSO, the same message‑passing litmus test still cannot produce r1=1, r2=0 because the write order is preserved.

// thread 1           // thread 2
x = 1;               r1 = y
 y = 1;               r2 = x

However, other tests (e.g., write‑buffer test) differentiate TSO from strict sequential consistency: TSO allows both reads to see 0.

Section 4: Evolution of x86‑TSO

Early x86 manuals gave little detail about the memory model. Real‑world experience (e.g., Plan 9, Linux kernel discussions) revealed surprising behaviors such as both reads seeing 0, prompting Intel and AMD to publish formal memory‑ordering white papers.

These papers introduced the Total‑Lock‑Order + Causal‑Consistency (TLO+CC) model, which is weaker than TSO but still permits certain outcomes that the original TSO model forbids.

Section 5: ARM/POWER Weak Memory Models

ARM and POWER adopt weaker models where each processor may have its own copy of memory and writes propagate independently, allowing many litmus tests that are impossible on x86 to succeed.

For example, the message‑passing test can produce r1=1, r2=0 on ARM/POWER because writes may be observed in different orders.

// thread 1           // thread 2
x = 1;               r1 = y
 y = 1;               r2 = x

Other tests such as store‑buffer and load‑buffer also succeed on these architectures, illustrating the lack of a global total order.

Section 6: Weak Ordering and Data‑Race‑Free Sequential Consistency (DRF‑SC)

Adve and Hill’s 1990 paper defines weak ordering via a synchronization model. If a program is data‑race‑free (DRF), hardware that satisfies minimal ordering guarantees will behave as if it were sequentially consistent (DRF‑SC).

This contract allows programmers to reason about their code using the simple sequential consistency model, provided they correctly use synchronization primitives.

Section 7: Acknowledgements

The author thanks colleagues at Google for discussions and feedback and assumes full responsibility for any errors or controversial viewpoints.

concurrencyMemory Modelx86ARMweak consistencyDRF-SClitmus test
Cognitive Technology Team
Written by

Cognitive Technology Team

Cognitive Technology Team regularly delivers the latest IT news, original content, programming tutorials and experience sharing, with daily perks awaiting you.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.