Read‑Write Semaphore (rw_semaphore) and Per‑CPU rwsem in the Linux Kernel (ARM64)
The article explains Linux kernel read‑write semaphores, detailing the classic rw_semaphore’s optimistic‑spinning and hand‑off mechanisms, then introduces the per‑CPU rwsem for ARM64, which replaces global counters with per‑CPU data and an RCU fast‑path to cut cache‑coherency traffic at the cost of losing optimistic spinning.
Overview : This article provides a step‑by‑step analysis of Linux kernel synchronization mechanisms, focusing on the read‑write semaphore (rw_semaphore) and its per‑CPU variant (percpu‑rwsem) for ARM64 platforms.
5. Read‑Write Semaphore – rw_semaphore
The rw_semaphore uses optimistic spinning and a handoff mechanism that are conceptually similar to those of a mutex, though the implementation differs slightly.
5.1 Principle Introduction
In most workloads, reads dominate writes. Because reads do not modify the critical data, they can be performed concurrently, while writes must be serialized, improving overall efficiency.
5.1.2 Why early kernel versions did not let the writer optimistic‑spin
Writer holds the lock briefly and releases it quickly, so optimistic spinning for writers behaves like that for mutexes. Readers, however, may be blocked for a long time if one reader holds the lock, making writer busy‑waiting inefficient.
Later kernels add a timer to the writer’s optimistic‑spin path; if the wait exceeds the timeout, the writer stops spinning and proceeds to sleep.
5.1.3 Hand‑off Mechanism
When a writer’s timeout expires, it marks itself as a hand‑off candidate, preventing further optimistic spinning and allowing a more deterministic hand‑off to the next writer.
5.2 Code Implementation
5.2.1 Key Data Structures
struct rw_semaphore
struct rwsem_waiter
Important fields include Count , owner , Timeout , and last_owner (purpose not fully understood from the source).
5.2.2 Key Function Interfaces
down_read → __down_read
Supporting functions: rwsem_read_trylock , rwsem_down_read_slowpath .
Reader optimistic‑spin flow and writer down‑write flow are illustrated with extensive diagrams (omitted here for brevity).
5.3 Application Scenarios
Typical use‑cases involve data structures that are read‑heavy and write‑light.
6. percpu‑rwsem
6.1 Background
On an 8‑core ARM64 system, 256 concurrent readers cause frequent global count updates, leading to cache invalidation and memory bounce. The per‑CPU rwsem aims to reduce these global updates by maintaining per‑CPU counters.
6.2 Code Practice
struct percpu_rw_semaphore
Key differences from rw_semaphore :
Optimistic spinning is removed.
The owner field is replaced by a dedicated writer identifier.
A new atomic block variable marks whether a writer has attempted to acquire the lock.
An RCU variable rss is introduced to accelerate the fast‑path for readers.
6.2.2 Key Interfaces
percpu_down_read
percpu_up_read
percpu_down_write
The writer must wait until all readers have left the critical section before the function returns.
percpu_up_write
6.3 Thought Questions
Removing optimistic spinning may degrade performance because it helps hide the state of the counterpart (reader or writer).
Per‑CPU rwsem reduces the chance of severe lock‑stealing, though low‑probability cases remain.
The RCU variable rss enables a fast‑path for readers by avoiding atomic operations, but it does not directly solve memory‑bounce issues.
Per‑CPU rwsem cannot completely replace the classic rw_semaphore; it eliminates memory bounce at the cost of losing optimistic spinning.
Overall, the article deepens the understanding of Linux kernel read‑write lock design, the trade‑offs between fairness, performance, and scalability, and introduces a per‑CPU optimization to mitigate cache‑coherency overhead.
OPPO Kernel Craftsman
Sharing Linux kernel-related cutting-edge technology, technical articles, technical news, and curated tutorials
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.