Scaling Zhihu’s Moneta Service with TiDB: Architecture, Performance, and Lessons Learned
This article describes how Zhihu’s Moneta application migrated billions of rows of user‑read data to the open‑source MySQL‑compatible NewSQL database TiDB, detailing the architectural redesign, performance improvements, migration challenges, and future expectations for TiDB 3.0.
Zhihu, the Chinese Quora‑like platform, stores about 1.3 trillion rows of user‑read posts in its Moneta service, generating roughly 1 trillion new rows per month and facing strict latency (≤90 ms) and high‑throughput requirements.
To meet these challenges the team evaluated TiDB, an open‑source MySQL‑compatible NewSQL database with HTAP capabilities, and chose it for its strong consistency, horizontal scalability, and cloud‑native design.
System architecture requirements included high availability, handling >40 k writes per second, storing massive historical data, processing millions of queries per second, and tolerating false positives in content filtering.
The new architecture consists of three layers: a stateless, scalable client API and proxy at the top; a soft‑state layer with Redis caching in the middle; and a TiDB cluster (TiDB servers, TiKV storage, PD meta‑service, and TiSpark) at the bottom, all orchestrated by Kubernetes for self‑healing and global fault monitoring.
After migration, Moneta achieved significant performance gains: peak write throughput exceeded 40 k TPS, 99th‑percentile response time dropped to ~25 ms, and 999th‑percentile to ~50 ms, with average latency far lower even for long‑tail queries.
Key lessons learned include separating latency‑sensitive queries into dedicated TiDB instances, using SQL hints and low‑precision timestamps to improve execution plans, and leveraging TiDB’s distributed transaction layer to reduce network round‑trips.
Future work focuses on TiDB 3.0 features such as the Titan storage engine (reducing write amplification), table partitioning (improving query performance), gRPC batch messaging, multi‑threaded Raftstore, SQL plan management, and TiFlash for column‑ariented analytical workloads.
These enhancements are expected to further lower latency, simplify cluster management, and enable seamless horizontal scaling as data volumes exceed a trillion rows.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.