Scaling Zhihu’s Moneta Service with TiDB: Architecture, Performance, and Lessons Learned
Zhihu’s Moneta service, handling over a trillion rows and billions of daily writes, migrated from MySQL to TiDB, achieving millisecond query latency, high availability, and horizontal scalability, and the article details the architecture, performance metrics, migration challenges, and lessons learned from this large‑scale deployment.
Zhihu’s Moneta service stores about 1.3 trillion rows of user‑read posts and generates roughly 100 billion new rows each month, leading to challenges in high‑availability, high‑throughput, and low‑latency query processing.
To meet requirements such as millisecond‑level query response, handling 40 k writes per second and 30 k queries per second, the team evaluated MySQL sharding and MHA but found them unsuitable for the scale.
They adopted TiDB, an open‑source MySQL‑compatible NewSQL database with HTAP capabilities, deploying its components (TiDB servers, TiKV, PD, TiSpark) and auxiliary tools (Syncer, DM, Lightning) to replace the previous MySQL backend.
The new architecture consists of a stateless SQL layer, a distributed transactional key‑value store, and optional analytical engines, all orchestrated by Kubernetes for high availability and automatic failover.
Performance measurements after migration showed 99th‑percentile query latency around 25 ms, 999th‑percentile around 50 ms, and sustained write throughput exceeding 40 k TPS, confirming the scalability and reliability of TiDB at this scale.
Key lessons learned include the importance of separating latency‑sensitive queries into dedicated TiDB instances, using SQL hints and plan management to guide the optimizer, and provisioning sufficient hardware resources because TiDB’s Raft‑based replication requires at least three replicas.
Future work explores TiDB 3.0 features such as the Titan storage engine, table partitioning, gRPC batch messaging, multi‑threaded Raftstore, and TiFlash for analytical workloads, which are expected to further reduce latency and improve query performance.
The article concludes that TiDB’s horizontal scalability, strong consistency, and MySQL compatibility enable Zhihu to handle petabyte‑scale data while maintaining low latency, and the team plans to continue contributing to the open‑source ecosystem.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.