Why and How to Implement Database Sharding: Strategies, Middleware, and Best Practices
This article explains the need for database sharding as applications scale, describes horizontal and vertical partitioning techniques, compares popular sharding middleware, and offers practical guidance on choosing and deploying sharding solutions for different workloads.
Why Sharding (Database Partitioning) Is Needed
As a startup or a fast‑growing department scales from tens of thousands to millions of users, a single database table quickly reaches tens of millions of rows and concurrent requests can surge from a few hundred to several thousand per second, overwhelming a single MySQL instance's CPU, I/O, and storage capacity.
Sharding (splitting databases and tables) distributes data across multiple databases or tables, reducing per‑node load, improving query performance, and allowing the system to handle higher traffic and larger data volumes.
Horizontal Partitioning (Sharding)
Horizontal partitioning copies the same schema into many databases/tables, but each holds a distinct subset of rows (e.g., based on user ID). This spreads load and storage, enabling higher QPS and larger total capacity.
Vertical Partitioning
Vertical partitioning splits a wide table into multiple tables or databases, each containing a different set of columns. Frequently accessed columns are kept together to improve cache efficiency, while less‑used columns are moved elsewhere.
Table‑Level Splitting
When a single table grows beyond a few hundred thousand rows, it is split into multiple tables (e.g., keeping each table under 2 million rows) to keep SQL execution fast.
Common Sharding Middleware
Cobar – an Alibaba proxy‑layer solution (no longer actively maintained, lacks read/write splitting, stored procedures, cross‑db joins, pagination).
TDDL – a client‑layer solution from Taobao (supports basic CRUD and read/write splitting, but depends on Alibaba's Diamond config service).
Atlas – a 360 open‑source proxy (community activity stopped about five years ago).
Sharding‑jdbc (now ShardingSphere ) – a client‑layer library that supports sharding, read/write splitting, distributed ID generation, and flexible transactions; latest version 4.0.0‑RC1 as of 2019.4.
Mycat – a proxy‑layer solution derived from Cobar, actively maintained, offers comprehensive features but requires separate deployment and operational effort.
Sharding‑jdbc vs Mycat
Sharding‑jdbc is lightweight, requires no extra servers, and offers high performance, but upgrades require updating every client application, creating tighter coupling.
Mycat adds a transparent proxy layer, increasing operational overhead but isolating projects from sharding logic, making upgrades easier for large organizations.
Impact of Sharding (Before vs After)
Before Sharding
After Sharding
Concurrency Support
Single‑node MySQL cannot handle high QPS
Multiple MySQL nodes increase concurrency manyfold
Disk Usage
Single‑node disk nearly full
Distributed disks lower utilization per node
SQL Performance
Large tables make queries slower
Smaller tables improve execution speed
Sharding Strategies
Range‑based sharding – allocate a continuous range (e.g., by date) to each database; simple to expand but can create hotspot traffic on recent data.
Hash‑based sharding – hash a key (e.g., user ID) to distribute rows evenly; balances load but requires data migration when adding new shards.
Practical Recommendations
For small‑to‑medium projects, the client‑side Sharding‑jdbc (now ShardingSphere ) is usually sufficient due to low operational cost and ease of integration.
For large enterprises with many services and high traffic, a proxy‑layer solution like Mycat provides better transparency and scalability, albeit with higher maintenance effort.
When designing your own sharding scheme, consider vertical splitting for wide tables, horizontal sharding for massive row counts, and table‑level splitting to keep each table within a manageable size (typically 1‑2 million rows).
Finally, choose a middleware that matches your team’s expertise and operational capacity, and ensure the sharding key aligns with your most common query patterns to avoid cross‑shard joins.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.