Async-fork: Mitigating Query Latency Spikes Incurred by the Fork-based Snapshot Mechanism from the OS Level
Async‑fork shifts the costly page‑table copying from Redis’s parent process to its child, allowing the parent to resume handling queries instantly and cutting snapshot‑induced latency spikes by over 98%, thereby dramatically improving tail latency during AOF rewrites, RDB backups, and master‑slave synchronizations.
This article discusses the Async-fork mechanism designed to mitigate query latency spikes caused by the fork-based snapshot mechanism in memory databases like Redis. The traditional fork() system call creates a child process for tasks such as AOF file rewriting, RDB backup generation, and master-slave full synchronization, but this process can cause significant latency due to the time-consuming page table copying.
The Async-fork mechanism moves the page table copying work from the parent process to the child process, allowing the parent process to quickly return to user mode and handle user queries while the child process completes the page table copying. This significantly reduces the tail latency of requests during snapshot operations.
The article explains the basic concepts of physical memory addresses, virtual address spaces, memory page tables, and virtual memory areas (VMA). It details the challenges of implementing Async-fork, including snapshot consistency issues and the need for active synchronization mechanisms to handle page table modifications during the copying process.
The article also presents practical testing results showing the performance improvements of Async-fork over traditional fork in Redis, including reduced fork() command execution time and improved TP100 (tail latency) during benchmark tests. The testing demonstrates that Async-fork significantly reduces latency spikes, with improvements of over 98% in single-machine test scenarios.
The article concludes that Async-fork provides substantial performance benefits for memory databases like Redis, particularly in scenarios involving adding slave nodes, RDB file backup, and AOF persistence file rewriting.
DeWu Technology
A platform for sharing and discussing tech knowledge, guiding you toward the cloud of technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.