How Pika’s Native Distributed Cluster Overcomes Redis Capacity Limits
This article explains Pika’s native distributed cluster architecture, detailing its deployment structure, data distribution with tables and slots, request processing flow, both non‑consistent and Raft‑based log replication, and the enhanced metadata management that enables scalable, high‑availability storage beyond single‑node Redis limitations.
Background
Pika is a persistent, large‑capacity Redis‑compatible storage service that supports most string, hash, list, zset, and set interfaces, addressing the memory bottleneck of Redis when handling massive data volumes. To meet growing demand for distributed clusters, the native Pika cluster (v3.4) was released.
Cluster Deployment Structure
The example shows a three‑node Pika cluster.
Deploy an Etcd cluster to store Pika manager metadata.
Install Pika manager on three physical machines; each registers with Etcd and competes to become the leader, with only one leader writing cluster data.
Deploy Pika nodes on the three machines and add their information to the Pika manager.
Register Pika service ports with LVS for load balancing.
Data Distribution
Pika introduces the concept of tables to isolate business data; keys are hashed to slots, each slot having multiple replicas forming a replication group. One replica acts as leader, handling read/write operations, while followers replicate data. The manager can schedule slot migration for balanced load and horizontal scaling.
Pika uses RocksDB as the storage engine; each slot creates a RocksDB instance supporting all Redis data structures. However, creating many slots can lead to excessive RocksDB instances and resource consumption, which future versions aim to mitigate.
Data Processing
The parsing layer interprets the Redis protocol and passes the result to the router.
The router finds the slot for the key and checks if it resides locally.
If the slot is remote, a task is created and the request is forwarded to the peer node; the response is returned after processing.
If the slot is local, the request is processed directly.
Write requests generate a binlog via the replication manager, which asynchronously replicates to other slot replicas; the leader slot writes to the database, using Blackwidow as the RocksDB interface.
Clients interact with the cluster without needing to be aware of an external proxy, and Pika service ports can be load‑balanced through LVS.
Log Replication
Non‑Consistent Log Replication
In this mode, the processing thread writes the binlog and updates the database immediately, then returns the response to the client. An auxiliary thread sends a BinlogSync request to follower slots, which acknowledge with BinlogSyncAck.
Processing thread receives client request, locks, writes binlog, and updates DB.
Thread returns response to client.
Auxiliary thread sends BinlogSync to follower.
Follower returns BinlogSyncAck.
Consistent (Raft) Log Replication
Here, the processing thread writes the binlog and sends a BinlogSync request to followers. The request is committed only after a majority of followers acknowledge, ensuring consistency before writing to the database.
Processing thread writes request to binlog file.
BinlogSync request is sent to followers.
Followers return BinlogSyncAck.
After receiving majority acknowledgments, the request is written to the DB.
Response is returned to the client.
Cluster Metadata Handling
Based on a customized Codis‑dashboard, the Pika Manager (PM) serves as the global control node, storing cluster metadata and routing information.
Supports creating multiple tables for business data isolation.
Allows specifying slot count and replica count per table.
Transforms group concepts to replication groups at the slot level.
Enables table‑level password authentication.
Facilitates slot migration for scaling.
Integrates a sentinel module that monitors node health and promotes the most up‑to‑date follower to leader when needed.
Metadata is persisted in Etcd for high availability.
PM achieves high availability by competing for locks in Etcd.
Afterword
The native Pika cluster removes the single‑node disk capacity limitation, allowing horizontal scaling according to business needs. Remaining issues include the lack of an internal Raft‑based leader election, range‑based data distribution, and monitoring dashboards, which will be addressed in future releases.
360 Zhihui Cloud Developer
360 Zhihui Cloud is an enterprise open service platform that aims to "aggregate data value and empower an intelligent future," leveraging 360's extensive product and technology resources to deliver platform services to customers.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.