Databases 15 min read

Database Optimization for Billion‑Scale Data: Partitioning, Sharding, and Vertical Splitting in MySQL

This article explains how a high‑traffic messaging platform with tens of millions of users and billions of daily records can be optimized using MySQL partitioning, sharding (both client‑side and proxy‑side), vertical database splitting, and practical migration scripts to maintain performance and availability.

Architecture Digest
Architecture Digest
Architecture Digest
Database Optimization for Billion‑Scale Data: Partitioning, Sharding, and Vertical Splitting in MySQL

In the mobile‑Internet era, massive user data is generated daily, and when the data volume grows, database performance degrades, causing TPS drops and service outages.

Although many developers rely on DBAs and consider Oracle for performance, not every project has a DBA or can afford Oracle; MySQL, however, can support hundred‑million‑level rows and is widely used by internet companies.

The case study focuses on an enterprise‑level unified messaging platform handling over 50 million customers, processing 10 million messages per minute and about 100 million messages per day.

A single MySQL table cannot efficiently store billions of rows, so the solution combines three common techniques: partitioning, sharding (database‑and‑table splitting), and NoSQL/NewSQL as auxiliary storage.

Partitioning : A partitioned table consists of multiple underlying tables that share the same storage engine and indexes. The project first created a massive customer‑binding table (over 5 billion rows) and observed large .myd files (~5 GB). MySQL supports four partition types; range and list were unsuitable, so HASH and KEY were used. An extra integer column (binding time) was added for HASH partitioning, and later a random digit from the 16‑19‑digit card number was combined with the card number to form a KEY partition column. Indexes must be built on the partition key to avoid full‑partition scans.

For document‑type bindings, the table was split into two: one for ID cards (using the month part of the ID number as the partition key, creating 12 partitions) and another for other document types, which did not require partitioning.

After partitioning, the 20 million‑row tables were reduced to 10‑12 smaller files, dramatically improving query speed.

Sharding (Database‑and‑Table Splitting) : Popular middleware such as Alibaba TDDL/DRDS, Cobar, Sharding‑JDBC, MyCAT, Atlas, and Zebra were reviewed. They fall into two architectures: client‑side and proxy‑side. The team preferred client‑side for its simplicity and lower overhead, ultimately adopting Sharding‑JDBC. The sharding column was chosen based on business logic—specifically the business number—so that each business’s data resides in its own table, enabling efficient routing.

Vertical Partitioning : To handle 10 million‑plus messages per minute, the system first performed table splitting, then database splitting, and finally partitioning. Business‑vertical databases separate low‑coupling tables into different databases, aligning with micro‑service architecture. Cross‑database queries are avoided or handled via data redundancy or application‑level joins.

Large daily tables were further partitioned by index columns (e.g., phone number and batch number). Historical data migration was automated with scripts such as:

CREATE TABLE test_a_serial_1 LIKE test_a_serial;
ALTER TABLE test_a_serial RENAME test_a_serial_{date};
ALTER TABLE test_a_serial_1 RENAME test_a_serial;
INSERT INTO test_a_serial_{date}(col1, col2, ...) SELECT col1, col2, ... FROM test_a_serial WHERE LEFT(create_time,8) > CONCAT(date); COMMIT;

Data older than six months is dropped by dropping tables whose names contain dates beyond the retention period. Queries are routed by time range: the application issues the same condition to each relevant table, aggregates results into a temporary table, and returns the final result.

The overall design reduces each partition to roughly 5 million rows, enables efficient indexing, and supports real‑time statistics across more than 100 dimensions using additional technologies such as streaming computation, message queues, and caching.

Big DataShardingMySQLDatabase Optimizationpartitioningvertical splitting
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.