Databases 23 min read

Database Sharding Strategies: Common Approaches, Pitfalls, and Best Practices

Effective MySQL sharding requires sustainable, low‑skew designs, favoring hash‑based methods with proper coprime counts, two‑stage partitioning, routing tables, or consistent hashing, while supporting expansion via doubling or flexible consistent‑hash growth to avoid hot spots and uneven data distribution.

vivo Internet Technology
vivo Internet Technology
vivo Internet Technology
Database Sharding Strategies: Common Approaches, Pitfalls, and Best Practices

This article provides a comprehensive guide to horizontal database sharding (分库分表) strategies in MySQL. The author discusses what constitutes a good sharding solution, emphasizing two key factors: sustainability (可扩展性) and data skew minimization.

Key Criteria for Good Sharding:

1. Sustainability : The solution should support smooth capacity expansion when data volume and traffic grow. Common expansion methods include doubling expansion and consistent hashing.

2. Data Skew : Data should be evenly distributed across databases and tables. The maximum data skew rate is defined as (max sample - min sample) / min sample, and should ideally be within 5%.

Common Sharding Approaches:

1. Range-based Sharding : Data is partitioned by value ranges (e.g., by year). While simple, it suffers from hot spots where current year data receives most load.

2. Hash-based Sharding : The most common approach. The author details several common mistakes:

- Mistake 1 : Using non-coprime database and table counts causes severe data skew where some tables never receive data.

- Mistake 2 : Calculating table index using database count makes expansion difficult.

Correct Hash-based Methods:

a) Standard Two-stage Partitioning : Calculate total slots, then derive database and table indices separately.

b) Routing Table Method : Store shard key to database mappings in a separate table, allowing flexible weight-based distribution.

c) Gene Method : Use different portions of the shard key for database and table calculations.

d) GCD Elimination : Remove common factors when calculating table indices.

e) Consistent Hashing : Use hash ranges to map to database nodes.

Expansion Methods:

1. Doubling Expansion : Double database count each time, using slave promotion. Requires brief write lock during transition.

2. Consistent Hash Expansion : More flexible, can expand individual nodes based on load rather than doubling entire cluster.

The article includes code examples demonstrating proper and improper sharding implementations, along with analysis of data distribution patterns.

backend-developmentmysqldatabase shardinghorizontal scalingconsistent hashingData Partitioningdatabase expansionHash Sharding
vivo Internet Technology
Written by

vivo Internet Technology

Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.