Understanding TiKV: Features, Architecture, and Large‑Scale Operational Challenges
This article introduces the distributed transactional KV store TiKV, explains its role as TiDB’s storage engine, details its multi‑layered architecture and Raft‑based consistency model, and discusses the performance and resource challenges encountered at massive data scales along with the engineering solutions implemented to address them.
TiKV is an open‑source distributed transactional key‑value database developed by PingCAP and graduated as a CNCF project. It serves as the storage engine for TiDB, offering strong linear consistency, high availability, and seamless horizontal scaling.
The system is built on a multi‑layered design: a network layer using high‑performance gRPC, a transaction layer implementing Percolator‑based optimistic and pessimistic transactions, a consistency layer providing Raft‑based strong consistency and multi‑Raft sharding, and a storage layer powered by RocksDB. Each layer interacts only through well‑defined interfaces, reducing coupling and easing testing.
TiKV clusters consist of multiple TiKV nodes and a Placement Driver (PD) cluster that stores metadata and routes keys to the appropriate Raft groups (Regions). Regions are range‑sharded, typically 96 MiB each, and managed by independent Raft groups; PD balances load across nodes.
When handling large‑scale data (e.g., 100 TB), TiKV faces three main challenges: thread bottlenecks, excessive resource consumption, and performance jitter. The original single‑thread Raft driver could not keep up with millions of Regions, leading to high latency.
To overcome the thread bottleneck, TiKV adopted an actor model where each Raft state machine runs on its own mailbox, enabling concurrent processing and better CPU utilization.
Resource consumption was mitigated by introducing Hibernate Region, which pauses idle Regions to free CPU and network resources, and by employing Region Merge to combine small Regions, reducing the total number of Raft groups.
Performance jitter caused by IO latency and RocksDB contention was addressed with asynchronous Raft IO, separating CPU‑intensive Raft tasks from IO‑intensive log persistence, and by planning per‑Region RocksDB instances with larger Region sizes to lower lock contention.
These engineering efforts collectively improve TiKV’s throughput, latency, and stability in massive data environments, allowing developers to focus on business logic while relying on TiKV for reliable, scalable storage.
Thank you for reading.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.