TiDB Overview: Architecture, Core Features, and Performance Compared with MySQL
The article explains why traditional MySQL sharding is discouraged, introduces the distributed database TiDB, details its architecture, core capabilities such as HTAP and Raft‑based consistency, compares feature support and resource usage with MySQL, and presents benchmark results demonstrating TiDB’s advantages at large scale.
Why Sharding Is Not Recommended
When MySQL reaches a certain scale, performance degrades, and many teams resort to middleware such as Mycat , ShardingSphere , or tddl for sharding, but these approaches introduce pagination issues, distributed transaction complexities, data migration and expansion challenges, changes in development patterns, cross‑database queries, and business trade‑offs.
TiDB Introduction
TiDB, an open‑source distributed relational database developed by PingCAP since September 2015, supports both OLTP and OLAP workloads, offers horizontal scalability, high‑availability, real‑time HTAP, cloud‑native deployment, and MySQL 5.7 protocol compatibility, making it suitable for high‑availability, strong‑consistency, and large‑scale data scenarios.
Core Features
Financial‑grade high availability
Online horizontal scaling with compute‑storage separation
Cloud‑native, deployable on public, private, or hybrid clouds
Real‑time HTAP with TiKV row store and TiFlash column store
MySQL protocol and ecosystem compatibility
Strongly consistent distributed transactions
Seamless migration from MySQL with minimal code changes
PD provides CP consistency in the CAP theorem
Use Cases
Financial systems requiring strong consistency, high reliability, and disaster recovery
High‑concurrency OLTP scenarios with massive data and scalability needs
Data aggregation and secondary processing pipelines
Architecture
TiDB Server : Stateless SQL layer exposing MySQL endpoints, parses queries, contacts PD for region location, interacts with TiKV, and returns results; horizontally scalable behind load balancers.
PD Server : Manages cluster metadata, performs scheduling, leader election, and allocates globally unique, monotonically increasing transaction IDs; typically deployed as an odd‑numbered quorum (minimum three nodes).
TiKV Server : Distributed transactional key‑value store handling OLTP data; data is partitioned into Regions, each covering a key range, and replicated (default three replicas) using the Raft consensus algorithm.
How TiKV Guarantees No Data Loss
Data is replicated across multiple nodes; writes are committed via the Raft protocol, requiring a majority of replicas (usually two of three) to acknowledge before confirming success.
Distributed Transaction Support
TiKV supports multi‑key transactions across regions, following Google Percolator’s model; see the original Percolator paper for details.
Comparison with MySQL
Supported Features
Distributed transactions based on Google Percolator (which itself builds on Bigtable)
Optimistic lock + MVCC; MySQL uses pessimistic lock + MVCC; TiDB checks write‑write conflicts only at commit time.
Unsupported Features
No stored procedures, functions, or triggers
Auto‑increment IDs work only within a single TiDB server
No foreign key constraints, temporary tables, or MySQL optimizer trace
XA syntax is not exposed via SQL (TiDB uses two‑phase commit internally)
Resource Usage Comparison
TiDB compresses data heavily: 10.8 TB in MySQL becomes 3.2 TB in TiDB (3.4:1 space ratio). For comparable workloads, TiDB uses far fewer nodes, CPU cores, and storage than MySQL.
Performance Tests
Test Report 1
Benchmarks on various AWS instance types (t2.medium to m5.24xlarge) with a 70 GB MySQL dataset vs. a 30 GB TiDB dataset (compressed). Sample queries include simple counts, group‑by, filtered scans, and complex aggregations. Results show TiDB’s response times improve relative to MySQL as CPU resources increase.
select count(*) from ontime; select count(*), year from ontime group by year order by year; select * from ontime where UniqueCarrier='DL' and TailNum='N317NB' and FlightNum='2' and Origin='JFK' and Dest='FLL' limit 10; select SQL_CALC_FOUND_ROWS FlightDate, UniqueCarrier as carrier, FlightNum, Origin, Dest FROM ontime WHERE DestState not in ('AK','HI','PR','VI') and OriginState not in ('AK','HI','PR','VI') and flightdate > '2015-01-01' and ArrDelay < 15 and cancelled = 0 and Diverted = 0 and DivAirportLandings='0' ORDER by DepDelay DESC LIMIT 10;System Benchmark
Using Sysbench on an m4.16xlarge instance, TiDB achieved higher transactions per second than MySQL for point‑select workloads with thread counts up to 128.
Test Report 2
Another benchmark with 1 M and 13 M records, followed by a JMeter load of 100 k operations, shows MySQL outperforms TiDB on small datasets due to TiDB’s distributed overhead, but TiDB scales better as data volume grows.
Conclusion
TiDB provides a mature, cloud‑native, HTAP‑capable distributed database that eliminates the need for traditional sharding; it is advantageous for large‑scale workloads, while smaller datasets may not justify its deployment.
IT Services Circle
Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.