Design, Deployment, and Lessons Learned from Codis and RebornDB: A Proxy‑Based Distributed Redis Solution
This article presents an in‑depth overview of Codis and its next‑generation project RebornDB, covering Redis, Redis Cluster, the proxy‑based architecture, consistency trade‑offs, production deployment experiences, operational pitfalls, and broader perspectives on distributed databases and architectures.
The talk, led by Huang Dongxu, CTO of PingCAP and co‑author of the open‑source project Codis, outlines five major topics: Redis, Redis Cluster, Codis, consistency considerations, production experiences, and views on distributed databases.
Redis, Redis Cluster and Codis – Redis is a ubiquitous cache layer, but its single‑node nature raises capacity and availability concerns. Codis addresses these by introducing a stateless proxy layer that shards data across multiple Redis instances, storing routing information in Zookeeper (or etcd) and allowing dynamic horizontal scaling.
Unlike Redis Cluster, which tightly couples storage and distribution logic, Codis separates them, enabling smoother scaling, hot‑slot migration, and the possibility of plugging in alternative storage engines such as RebornDB’s qdb.
Consistency – The author explains why read‑write separation is avoided due to asynchronous replication, and how Codis’s MGET/MSET cannot guarantee atomicity across slots, making distributed transactions difficult. Lua scripts are supported but require careful key placement, often using hashtag syntax like {uid1}age to ensure keys reside on the same shard.
Future Design Directions – Planned improvements include embedding Raft in the proxy to replace Zookeeper, abstracting the storage engine layer for automated lifecycle management, and implementing replication‑based migration to reduce migration latency and memory overhead.
Production Experience and Pitfalls – The speaker shares practical tips on multi‑product line deployment, Zookeeper sizing (odd number of nodes, ≥3), HA strategies for both proxy and Redis layers, dashboard usage, Go version recommendations, queue design, master‑slave handling, cross‑data‑center considerations, and proxy placement.
Views on Distributed Databases – The author reflects on the evolution from single‑node to distributed systems, the challenges of providing strong transactional guarantees, and cites Google Spanner as a model for NewSQL that combines KV storage, SQL querying, and distributed transactions.
Q&A Highlights – Key questions cover Codis’s lack of multi‑replica concepts, Zookeeper’s role, sharding algorithm ( crc32(key)%1024 ), authentication, cross‑data‑center replication, and details of Codis’s auto‑rebalance implementation.
Qunar Tech Salon
Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.