Databases 16 min read

Managing Large MySQL Tables: Evaluation, Partitioning, Sharding, and Cold-Data Archiving Strategies

This article explains how to assess MySQL table size, understand performance degradation caused by large B+‑tree indexes, and apply three practical solutions—table partitioning, database sharding, and cold‑data archiving—to improve query speed and maintainability.

Code Ape Tech Column
Code Ape Tech Column
Code Ape Tech Column
Managing Large MySQL Tables: Evaluation, Partitioning, Sharding, and Cold-Data Archiving Strategies

When a business database table grows to tens of millions of rows, insert and query latency increase, schema changes become costly, and only recent data is often needed; this article walks through diagnosing and solving these issues.

Evaluating Table Size

Table size can be assessed from three angles: table capacity, disk space usage, and instance capacity.

Table Capacity

Consider record count, average row length, growth rate, read/write volume, and total size. For OLTP tables, keep rows under 20 million and total size under 15 GB, with read/write throughput below 1,600 ops/s.

Typical SQL to count rows:

select count(*) from table;
select count(1) from table;  -- may timeout on large tables

To view detailed table status:

use database_name;
show table status like 'table_name'\G;

Disk Space

Query database and table disk usage via information_schema.tables :

select table_schema as 'Database',
       table_name   as 'Table',
       table_rows   as 'Rows',
       truncate(data_length/1024/1024,2) as 'DataSize(MB)',
       truncate(index_length/1024/1024,2) as 'IndexSize(MB)'
from information_schema.tables
order by data_length desc, index_length desc;

Results should keep disk usage below 70% of capacity; fast‑growing data can be archived to slower storage.

Instance Capacity

MySQL uses a thread‑per‑connection model, so a single instance may become a bottleneck under high concurrency; consider scaling out or using multiple instances.

Root Causes of Slow Queries on Large Tables

When a table reaches tens of millions of rows, index maintenance costs rise because the B+‑tree height increases, leading to more disk I/O per lookup.

InnoDB stores data in 16 KB pages; a leaf node can hold roughly 16 rows of 1 KB each, while internal nodes store 8‑byte primary keys plus 6‑byte pointers, allowing about 1,170 entries per node. Consequently, a B+‑tree of height 2 can store ~18,720 rows, height 3 can store ~21.9 million rows, which matches typical large‑table scales.

How to Solve the Problem of Large Tables and Slow Queries

Three main approaches are presented: table partitioning, database sharding (horizontal/vertical splitting), and hot‑cold data archiving.

Solution 1: Table Partitioning

Partitioning divides a table’s rows into separate physical files based on a condition, reducing query range and improving index hit rates. Benefits include easier data deletion, better parallel aggregation, and higher I/O throughput across disks. Limitations: maximum 1,024 partitions, partition key must include primary/unique keys, no foreign keys, and partitions apply to both data and indexes.

Check partition support:

mysql> show variables like '%partition%';
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| have_partitioning | YES   |
+-------------------+-------+
1 row in set (0.00 sec)

Solution 2: Database Sharding (Horizontal & Vertical)

Sharding reduces per‑table row count, lowering B+‑tree height and I/O. Horizontal sharding splits rows across multiple tables or databases (e.g., 40 M rows into four 10 M tables). Vertical sharding separates frequently accessed columns from rarely used ones, storing them in different tables.

Common sharding strategies:

Modulo: Distribute rows by id % N . Simple but re‑sharding is hard when N changes.

Range: Allocate rows to tables based on value ranges (e.g., id 0‑12 M → table1, 12‑24 M → table2). Easier to expand but can create hotspots.

Hash‑plus‑Range: Combine hash modulo for database selection and range for table selection, balancing load and scalability.

Solution 3: Hot‑Cold Data Archiving

When data exhibits clear hot‑cold patterns (e.g., recent week/month is hot), move older rows to archive tables or separate databases, keeping the hot tables small for fast queries.

Archiving steps:

Create an archive table with the same schema as the original.

Initialize the archive with historical data.

Continuously migrate new cold data according to business rules.

Choosing the Right Strategy

Selection depends on workload characteristics: use partitioning for range‑based queries and easy data pruning, sharding for very high write throughput or when horizontal scaling is needed, and hot‑cold archiving when data access frequency varies over time. Often a combination yields the best results.

Conclusion

The article provides practical SQL snippets, diagrams, and a decision matrix to help engineers select and implement the most suitable method for handling massive MySQL tables.

Note: The author also promotes a knowledge community and paid resources for deeper learning.

Performance OptimizationMySQLdatabase shardingB-Treetable partitioningCold Data Archiving
Code Ape Tech Column
Written by

Code Ape Tech Column

Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.