Replacing Redis with KVROCKS on SSD: Design, Implementation, and Performance Evaluation
This article details Ctrip's investigation of SSD‑based KVROCKS as a low‑cost, Redis‑compatible alternative, covering background, solution selection, secondary development steps, extensive performance testing on different SSDs, cost‑saving analysis, encountered pitfalls, and future plans.
Background – In 2019 Ctrip needed a cheap, high‑capacity Redis‑compatible service for overseas customers. Public‑cloud Redis was about ten times more expensive than private‑cloud alternatives, prompting a study of Redis‑on‑SSD solutions.
Research and Selection – The team evaluated Redis‑On‑Flash (commercial), Pika, and KVROCKS. Pika suffered from binary client protocol incompatibility, heavy Rsync‑based replication, and tangled code. KVROCKS, though newer, offered closer semantic compatibility with Redis and a cleaner code base, leading to its selection for further development.
Secondary Development – To make KVROCKS act as a Redis slave, the authors added support for the SYNC/PSYNC replication protocol, implemented a state machine mirroring Redis slave states, and extended the server to store replication metadata (repl_offset, repl_id, slave_mode). They also wrapped RDB parsing, command propagation, and added missing Redis commands for monitoring.
Data and Performance
Extensive testing (≈100 versions, 2 months production) yielded the following insights:
KVROCKS on SSD shows comparable latency to in‑memory Redis under low thread counts (4 threads for client commands).
Increasing threads beyond four does not improve average response time.
Optane (Intel SSD P480) outperforms regular SSDs, delivering roughly one‑third the latency and three‑times the QPS in SET benchmarks.
Under 10 k QPS per instance, KVROCKS can replace Redis cost‑effectively; higher QPS leads to CPU saturation due to RocksDB’s compaction overhead.
Cost Savings – By moving from a 6 GB Redis instance to a KVROCKS instance using ~1 GB memory and full CPU, Ctrip estimated a 60‑80 % reduction in infrastructure cost, roughly 63 % in a real‑world cluster.
Pitfalls Encountered
Jemalloc must be built with --with-jemalloc-prefix=je_ for container compatibility.
Binary incompatibility across CPU generations caused illegal‑instruction crashes.
RocksDB fails on certain virtual file systems; switching to XFS resolves the issue.
Type mismatches for setbit required a key‑prefix workaround.
Concurrent access to libevent evbuffer caused crashes; adding a lock fixed it.
KVROCKS pub/sub deadlock was reported and quickly patched upstream.
Future Outlook – Over 50 % of public‑cloud Redis instances have already been migrated to KVROCKS, with plans to replace all replaceable instances, support Redis‑slave‑of‑KVROCKS, and eventually open‑source the solution.
Ctrip Technology
Official Ctrip Technology account, sharing and discussing growth.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.