Databases 9 min read

Understanding Database Sharding: Reasons, Methods, and Common Challenges

This article explains why database sharding (both vertical and horizontal) is needed to overcome performance bottlenecks, outlines practical thresholds for splitting tables, describes vertical and horizontal partitioning techniques, and discusses key issues such as distributed transactions, cross‑node queries, global primary keys, and scaling strategies.

Python Programming Learning Circle

Sep 30, 2022

Understanding Database Sharding: Reasons, Methods, and Common Challenges

Database sharding (分库分表) becomes necessary when a single database can no longer handle the load, leading to request blocking, slow SQL queries, storage pressure, and limited write concurrency.

Typical thresholds suggested by Alibaba's P3C guideline recommend sharding when a single table exceeds 5 million rows or 2 GB in size; otherwise, avoid premature partitioning.

Vertical partitioning separates databases by business domain (e.g., order database vs. promotion database) to reduce contention, while vertical table splitting moves rarely used columns into separate tables to simplify development and avoid wide‑table issues.

Horizontal partitioning splits a table into multiple tables based on a key (range or hash) within the same database; when necessary, these tables can be distributed across multiple databases ( horizontal sharding ).

Key challenges and solutions include:

Distributed transactions : use two‑phase commit (2PC) or compensating transactions (TCC); middleware such as Spring JTA or Alibaba Fescar can help.

Cross‑node JOIN queries : mitigate by field redundancy, data abstraction (ETL), global tables, or application‑level assembly.

Cross‑node sorting, pagination, and function calculations : employ a secondary search store (Elasticsearch/Solr) for analytics queries while keeping the primary sharded store for transactional writes.

Global primary key generation : options include UUID (32‑character hex string), Redis‑based incremental IDs with distributed locks, Twitter's Snowflake algorithm (timestamp + machine ID + sequence), or database sequences with a step size equal to the number of nodes.

Scaling and expansion : set the number of tables to a power of two; when adding tables, use the same power‑of‑two strategy to minimize data migration.

Overall, the sharding strategy should start with a single‑database‑single‑table design, move to partitioning only when data volume or access patterns justify it, and consider the trade‑offs of each technique.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Scalability database sharding global ID distributed transactions horizontal partitioning Vertical Partitioning

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.