How KeyDB Transforms Redis into a Multi‑Threaded Database
KeyDB, a Redis fork, replaces the single‑threaded architecture with a multi‑threaded model using a main thread and worker I/O threads, SO_REUSEPORT, per‑thread connection management, fastlock spin‑lock mechanisms, and active‑replica support, enabling concurrent data access and improved performance.
KeyDB is a fork of Redis that maintains 100% compatibility with the Redis API while converting Redis into a multi‑threaded system.
Thread Model
KeyDB splits Redis's original main thread into a main thread and multiple worker threads. Each worker thread is an I/O thread responsible for listening on ports, accepting connections, reading data, and parsing the protocol.
KeyDB uses the SO_REUSEPORT feature, allowing multiple threads to bind to the same listening port.
Each worker thread also binds to a specific CPU using the
SO_INCOMING_CPUfeature to designate which CPU receives data.
After parsing the protocol, each thread operates on in‑memory data protected by a global lock to control concurrent access.
The main thread is itself a worker thread (index 0 in the worker array) and additionally performs tasks that only the main thread can execute.
Main thread responsibilities (implemented in
serverCron) include:
Processing statistics
Client connection management
Resizing and reshaping DB data
Handling AOF
Replication master‑slave synchronization
Cluster‑mode tasks
Connection Management
In Redis, all connection management is handled by a single thread. In KeyDB, each worker thread manages its own set of connections, inserting them into a thread‑local connection list; creation, operation, and destruction of a connection must occur within the same thread.
<code>int iel; /* the event loop index we're registered with */</code>KeyDB maintains three key data structures for connection management:
clients_pending_write: thread‑local list for synchronously sending data to client connections
clients_pending_asyncwrite: thread‑local list for asynchronously sending data to client connections
clients_to_close: global list for connections that need to be closed asynchronously
Separating synchronous and asynchronous queues handles cases where a publishing thread differs from the subscriber's thread, ensuring correct delivery of messages.
When a local thread needs to send data asynchronously, it checks whether the client belongs to the local thread; if not, it obtains the client’s owning thread ID and enqueues a write event via
AE_ASYNC_OP::CreateFileEvent. The owning thread then processes the pipe message and adds the request to its write events.
<code>int fdCmdWrite; // write pipe
int fdCmdRead; // read pipe</code>Some client‑close requests are not executed in the thread that owns the connection, so KeyDB maintains a global asynchronous close list.
Lock Mechanism
KeyDB implements a spin‑lock‑like mechanism called
fastlock.
Key data structures of
fastlock:
<code>struct ticket {
uint16_t m_active; // unlock +1
uint16_t m_avail; // lock +1
};
struct fastlock {
volatile struct ticket m_ticket;
volatile int m_pidOwner; // thread ID that currently holds the lock
volatile int m_depth; // recursion depth for the owning thread
};</code>Atomic operations such as
__atomic_load_2,
__atomic_fetch_add, and
__atomic_compare_exchangecompare
m_activeand
m_availto determine lock acquisition.
fastlockprovides two acquisition methods:
try_lock: returns immediately if the lock cannot be obtained
lock: busy‑waits, performing up to 1024 × 1024 iterations before yielding the CPU with
sched_yieldKeyDB combines
try_lockwith the event loop to avoid busy‑waiting: each client has a dedicated lock; before reading client data, the lock is attempted, and if it fails, the operation is deferred to the next
epoll_waitcycle.
Active‑Replica
KeyDB implements an active‑replica mechanism where each replica can be writable (non‑read‑only) and synchronizes data with other replicas. Key features include:
Each replica has a UUID to eliminate circular replication
A new
rreplayAPI packages incremental commands with the local UUID
Keys and values carry a timestamp version; writes with older timestamps are rejected, using a timestamp composed of the current time shifted left 20 bits plus a 44‑bit increment
Project Address
https://github.com/JohnSully/KeyDB
macrozheng
Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.