When and How to Implement Database Sharding and Partitioning for High‑Scale Systems
This article explains why database sharding and partitioning become necessary, outlines performance bottlenecks, presents software and hardware optimization techniques, details horizontal and vertical splitting strategies, and discusses the added complexity such as cross‑database joins, distributed transactions, and ID generation.
Why Sharding and Partitioning?
When a database reaches performance bottlenecks—such as request blocking, slow SQL queries, or storage pressure—simply upgrading hardware becomes costly, so software‑level solutions like sharding are preferred.
Database Optimization Strategies
Optimizations are divided into software and hardware layers.
SQL tuning
Table structure optimization
Read/write separation
Database clustering
Sharding (database and table partitioning)
Hardware upgrades
SQL Tuning
Enable slow query logging in MySQL:
<code>slow_query_log=on
long_query_time=1
slow_query_log_file=/path/to/log
</code>Use
EXPLAINto check if a query hits an index; aim for
rangeor better.
Table Structure Optimization
Redundant fields (e.g., storing user nickname in the order table) can reduce join overhead but should be used for rarely updated data.
Architecture Optimization
Scale by adding read replicas, introducing caching layers (e.g., Redis), and eventually moving to sharding when reads remain a bottleneck.
Hardware Optimization
Hardware upgrades provide early gains but diminish returns as the system grows.
Detailed Sharding Process
Single Application, Single Database
Early-stage monolithic apps use one database for all modules.
Multiple Applications, Single Database
Modules are split into services but still share one database to minimize impact.
Multiple Applications, Multiple Databases
Each service gets its own database, eliminating contention and preparing for sharding.
Table Partitioning
When a single table grows rapidly (e.g., >5 million rows), horizontal or vertical splitting is needed.
Horizontal vs. Vertical Splitting
Vertical splitting separates columns into different tables; horizontal splitting distributes rows across multiple tables or databases.
Daily tables store only the day's data.
Monthly tables aggregate a month’s data.
Historical tables archive data older than a threshold.
Single‑Database vs. Multi‑Database Splitting
Horizontal splits can reside in the same DB or be spread across multiple DBs to overcome storage limits.
Complexities Introduced by Sharding
Cross‑Database Joins
Solutions include field redundancy, data aggregation (ETL), global tables, or application‑level assembly.
Distributed Transactions
Use reliable messaging, two‑phase commit, or flexible transaction patterns.
Sorting, Pagination, and Function Computation
Execute functions on each shard, then merge results.
Distributed ID Generation
Common approaches: UUID, dedicated ID tables, segment allocation, Redis, Snowflake, Baidu uid‑generator, Meituan Leaf, Didi TinyID.
Multiple Data Sources
Middleware such as ShardingSphere (formerly Sharding‑JDBC) or Mycat can abstract multiple databases.
Conclusion
Before resorting to sharding, exhaust conventional optimizations; sharding adds significant complexity and should be applied only when truly necessary.
macrozheng
Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.