Databases 16 min read

Optimizing a 20‑Million‑Row MySQL Table: Design, Indexing, Partitioning, and Migration Strategies

This article describes how to improve the performance of a massive MySQL 5.6 user‑log table by redesigning schema, applying proper indexes, using partitioning, considering table sharding, and evaluating upgrade paths to compatible cloud or big‑data databases, with concrete SQL examples and cost analysis.

Architect's Tech Stack
Architect's Tech Stack
Architect's Tech Stack
Optimizing a 20‑Million‑Row MySQL Table: Design, Indexing, Partitioning, and Migration Strategies

The author inherited a poorly designed MySQL 5.6 table that stores six months of user‑web‑access logs (≈20 million rows) and retains a year’s data (≈40 million rows). Queries were extremely slow, often freezing the service.

Solution Overview

Three practical approaches are presented in order of increasing cost and effort:

Optimize the existing MySQL database without code changes.

Upgrade to a 100 % MySQL‑compatible database (cloud or self‑hosted).

Adopt a big‑data engine (NewSQL/NoSQL) for massive scale.

1. Optimizing the Existing MySQL Database

Key recommendations include:

Design tables with performance in mind: avoid NULLs, prefer INT over BIGINT, use ENUM/CHAR instead of VARCHAR, keep column count low, store IPs as integers, and use TIMESTAMP rather than DATETIME.

Index wisely: create indexes only on columns used in WHERE, ORDER BY, or JOIN conditions; avoid indexing low‑cardinality columns; use prefix indexes for long strings; avoid NULL checks in WHERE clauses; prefer composite indexes with column order matching query predicates.

Write efficient SQL: limit result sets, avoid SELECT *, use LIMIT for pagination, replace sub‑queries with JOINs, split large DELETE/INSERT statements, enable slow‑query logs, avoid functions on indexed columns, replace OR with IN, and avoid LIKE patterns that start with a wildcard.

Additional tactics such as using EXPLAIN to verify index usage and converting columns to NOT NULL were emphasized.

2. Partitioning and Sharding

MySQL’s native partitioning (available since 5.1) can horizontally split the table without code changes. Types include RANGE, LIST, HASH, and KEY. The author first tried RANGE partitioning by month (12 partitions) with modest speed gains, then switched to HASH partitioning on the id column with 64 partitions, achieving a significant performance boost.

Benefits of partitioning:

Allows larger logical tables and easier maintenance (e.g., dropping whole partitions).

Improves query speed when predicates limit the scan to a few partitions.

Supports distributing partitions across different physical devices.

Limitations include a maximum of 1024 partitions per table, the need for all partition columns to be part of primary/unique keys, inability to use foreign keys, and the requirement that all partitions share the same storage engine.

Example partition definition and queries:

PARTITION BY HASH (id) PARTITIONS 64
SELECT COUNT(*) FROM readroom_website;  -- 11,901,336 rows
SELECT * FROM readroom_website WHERE MONTH(accesstime)=11 LIMIT 10;

3. Table Splitting (Vertical/Horizontal)

If partitioning is insufficient, the table can be split into multiple tables (e.g., tableName_id%100 ) or even separate databases, but this requires code changes and higher development cost, so it is discouraged for already‑deployed systems.

4. Upgrading the Database Engine

When MySQL performance is inadequate, the author suggests moving to a fully compatible solution:

Open‑source options: TiDB, Cubrid.

Alibaba Cloud POLARDB (MySQL‑compatible, up to 100 TB, 6× performance).

Alibaba Cloud OceanBase, HybridDB for MySQL (HTAP).

Tencent Cloud DCDB (MySQL‑compatible, horizontal sharding).

Each option balances cost, operational overhead, and feature set; POLARDB was highlighted as a cost‑effective, high‑performance choice.

5. Big‑Data Migration

For data exceeding hundreds of millions of rows, the author recommends a big‑data stack:

Open‑source: Hadoop ecosystem (HBase, Hive) – high operational cost.

Cloud services: Alibaba MaxCompute + DataWorks (SQL, MapReduce, AI scripts) – low cost, serverless‑style processing.

Using MaxCompute, the author processed the data with ~300 lines of SQL for under ¥100, achieving the required analytics without managing infrastructure.

Overall, the article provides a step‑by‑step guide to diagnosing MySQL performance bottlenecks, applying schema and index optimizations, leveraging partitioning, evaluating migration paths, and choosing appropriate cloud or big‑data solutions based on scale and budget.

IndexingPerformance TuningMySQLDatabase OptimizationLarge TablespartitioningCloud Databases
Architect's Tech Stack
Written by

Architect's Tech Stack

Java backend, microservices, distributed systems, containerized programming, and more.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.