Async IO Thread in Redis 8.0 M3: Design, Implementation, and Performance Evaluation
The article explains why Redis needs asynchronous IO threading, describes the shortcomings of previous IO‑thread models, details the design of the new async IO thread architecture with event‑notified client queues and thread‑safety mechanisms, and presents performance test results showing up to double the QPS and significantly lower latency.
High Performance Demand
Before discussing Redis IO multithreading, the article asks whether IO multithreading is necessary and if single‑threaded Redis performance is sufficient. It notes that most workloads do not hit CPU limits, and IO multithreading mainly helps IO‑intensive scenarios such as hot keys, large‑scale servers, mismatched performance/capacity, TLS overhead, and cluster limitations.
Hotspot data: sudden spikes (e.g., sports events, flash sales) create extremely high QPS that cannot be solved by horizontal scaling.
Machine specs: on large VMs the single Redis thread wastes CPU cycles; adding IO threads can better utilize resources.
Performance‑capacity mismatch: high QPS with modest data size makes sharding costly.
TLS degradation: encryption consumes CPU, hurting single‑threaded throughput.
Cluster mode limits: some commands still require single‑slot placement, leading users to prefer master‑replica setups.
Thus, IO multithreading gives Redis internal performance scaling and more architectural flexibility.
Analysis of Existing Versions
The article shows a flame graph of SET/GET in single‑threaded Redis, highlighting that network IO dominates CPU usage while command execution is minimal. Redis 6.0 introduced IO multithreading to offload request reading, command parsing, and reply sending to IO threads, keeping the main thread focused on command processing.
However, the existing IO‑thread model has drawbacks:
The main thread blocks while IO threads perform reads/writes, and IO threads busy‑wait for the main thread, preventing effective multi‑core parallelism.
When client connections increase, IO threads stay at near‑100% CPU due to busy‑wait, making bottleneck identification hard and wasting CPU.
Enabling TLS together with io-threads-do-reads can cause race conditions and crashes (see Issue #12540).
To address these issues, the article proposes an asynchronous IO thread design where IO threads use an event‑driven model and operate in parallel with the main thread.
Async IO Thread Implementation
The new design keeps the rule that all client commands must run on the main thread, preserving Redis's original single‑threaded semantics while moving only network read/write and command parsing to independent IO threads with their own event loops and epoll‑based multiplexing.
The interaction between the main thread and IO threads follows four steps:
The main thread accepts a new connection and assigns the client to an IO thread.
The IO thread reads the request and parses the command, then notifies the main thread.
The main thread processes the command, generates a reply, and sends it back to the corresponding IO thread.
The IO thread writes the reply to the client and continues handling further read/write events.
Event‑notified client queue
Each IO thread and the main thread own two event‑notified client queues. These queues store pending clients and use eventfd (or pipe) integrated with epoll for notification. A lightweight pthread_mutex protects the tiny critical section of pointer swaps, minimizing contention.
Thread Safety
Switching to an asynchronous model introduces thread‑safety challenges. The main thread may need to access or modify IO‑thread data (e.g., client buffers). An IO‑thread pause/resume mechanism is introduced, using busy‑wait for state confirmation and atomic variables for memory ordering. Certain client types (replication, monitor, subscription, tracking) remain handled by the main thread to avoid severe contention.
Observability
The design assigns new clients to the IO thread with the fewest connections. INFO output now includes a THREADS section showing per‑thread client counts and read/write statistics:
# Threads
io_thread_0:clients=0,reads=0,writes=0
io_thread_1:clients=2,reads=100,writes=100
io_thread_2:clients=2,reads=100,writes=100The CLIENT LIST output also adds an io-thread field to indicate the assigned thread:
id=244 addr=127.0.0.1:41870 laddr=127.0.0.1:6379 ... resp=2 lib-name= lib-ver= io-thread=1Operators can monitor CPU usage with top -H -p $redis_pid to identify whether the main thread or IO threads are the bottleneck, and adjust io-threads accordingly.
Performance Testing
PR #13665 demonstrates Redis IO multithreading on an AMD Ryzen 9 7950X. The author reproduced tests on three Alibaba Cloud ECS instances (c8i.4xlarge, 16 vCPU 32 GiB) using a master‑slave Redis setup and memtier_benchmark as the load generator.
Two test groups differed only in KV size (32 B vs 512 B). Each run used 3 million keys, 400 clients, 60 seconds, and varied write/read ratios (1:0, 0:1, 1:1, 1:10). IO threads were set to 6, and the benchmark client ran with 8 threads.
KV size 32 bytes – QPS and P99 latency
memtier_benchmark -s xxx --data-size 32 --ratio xxx --key-pattern P:P --key-minimum=1 --key-maximum 3000000 --distinct-client-seed --test-time 60 -c 50 -t 8 --hide-histogramQPS
P99 latency
KV size 512 bytes – QPS and P99 latency
memtier_benchmark -s xxx --data-size 512 --ratio xxx --key-pattern P:P --key-minimum=1 --key-maximum 3000000 --distinct-client-seed --test-time 60 -c 50 -t 8 --hide-histogramQPS
P99 latency
Compared with Redis 7.4, the Async IO Thread can double QPS and reduce P99 latency by over 30 % in most scenarios. In a 1:10 write‑read test, Redis 7.4 suffered >50 % key‑miss rate, while the new version stayed around 35 %.
Note: When switching to a smaller instance (c8i.2xlarge, 8 vCPU 16 GiB) while keeping io-threads =6, performance dropped more than 10 %, likely due to insufficient vCPU capability. Users should benchmark according to their hardware.
Conclusion
Async IO Thread introduced in Redis 8.0 M3 markedly improves performance, offering higher throughput and lower latency for modern applications. Detailed implementation and test results can be found in PR #13665.
Cognitive Technology Team
Cognitive Technology Team regularly delivers the latest IT news, original content, programming tutorials and experience sharing, with daily perks awaiting you.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.