Backend Development 18 min read

Design and Implementation of Zhihu's High‑Throughput Read Service Using TiDB

This article presents a detailed case study of Zhihu's read‑service architecture, describing its high‑availability, high‑performance, and easy‑scalability design goals, key components such as Proxy, Cache and TiDB storage, performance metrics, and the migration from MySQL to TiDB with practical lessons learned.

IT Architects Alliance
IT Architects Alliance
IT Architects Alliance
Design and Implementation of Zhihu's High‑Throughput Read Service Using TiDB

Zhihu operates a massive knowledge platform with billions of answers and hundreds of millions of users, requiring an efficient system to filter already‑read content from the homepage feed and personalized push notifications.

The read‑service was built around three core design goals: high availability, high performance, and easy scalability. High availability is achieved through systematic fault detection, self‑healing components, and isolation of failures; performance is ensured by slot‑based caching, multi‑replica buffers, and compression; scalability is obtained by minimizing stateful components and leveraging stateless services orchestrated by Kubernetes.

Key components include a stateless Proxy that routes requests to cache slots, a multi‑layered Cache employing Bloom filters, write‑through and read‑through strategies, and a storage layer that migrated from MySQL to TiDB to handle trillions of records with strong consistency and horizontal scaling.

Performance measurements show the service handling up to 40 k writes per second, 30 k queries per second, and 12 million document checks per second, with 99th‑percentile latency remaining under 25 ms, demonstrating the system’s ability to meet strict latency requirements.

The migration to TiDB involved using TiDB DM for incremental binlog capture and TiDB Lightning for bulk data import, followed by query optimizations such as isolated TiDB instances for latency‑sensitive queries, SQL hints, low‑precision TSO, and prepared‑statement reuse. TiDB 3.0 further contributed features like gRPC batch messages, multi‑threaded Raft stores, Plan Management, TiFlash, and the Titan storage engine, which together improved both write throughput and query latency for the read‑service and a related anti‑fraud system.

In conclusion, the architecture demonstrates how a well‑designed, cloud‑native backend can achieve high availability, performance, and scalability while supporting massive data volumes, and highlights the benefits of adopting open‑source technologies such as TiDB and contributing back to the community.

performancebackend architecturehigh availabilitycachingTiDBread service
IT Architects Alliance
Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.