Databases 14 min read

MySQL Large‑Table Index Optimization and Maintenance Strategies

This article analyzes performance problems of a massive MySQL table, demonstrates how to use pt‑query‑digest and EXPLAIN to identify inefficient composite indexes, proposes index redesign, compares online DDL with pt‑osc for index rebuilding, and presents batch‑delete techniques to reduce SLA‑impacting latency.

Selected Java Interview Questions
Selected Java Interview Questions
Selected Java Interview Questions
MySQL Large‑Table Index Optimization and Maintenance Strategies

Background

The production MySQL instance (one master, one slave) reports nightly SLA alerts due to significant master‑slave replication lag caused by slow queries on the arrival_record table, which stores over 100 million rows.

Slow‑query logs show many queries taking several seconds, especially SELECT arrival_record … and DELETE arrival_record … statements.

Analysis

Using pt-query-digest --since=148h mysql-slow.log | less reveals that in the last week the total slow‑query execution time is 25 403 s, the longest query runs 266 s, and the average execution time is 5 s while scanning about 17.66 million rows per query.

The most frequent slow query is:

select count(*) from arrival_record where product_id=26 and receive_time between '2019-03-25 14:00:00' and '2019-03-25 15:00:00' and receive_spend_ms>=0\G

EXPLAIN shows it uses the composite index IXFK_arrival_record(product_id,station_no,sequence,receive_time,arrival_time) but only the first column product_id is effective; the column has low cardinality, causing the optimizer to scan millions of rows.

Other slow queries such as select arrival_record … scan up to 56 million rows, and the DELETE statements perform full table scans because no suitable index is available.

Optimization Recommendations

Drop the low‑selectivity composite index IXFK_arrival_record .

Create a new composite index idx_sequence_station_no_product_id(product_id,sequence,station_no) that matches the actual query predicates.

Create a single‑column index idx_receive_time(receive_time) to accelerate time‑range queries.

Delete Statement Optimization

The DELETE FROM arrival_record WHERE receive_time < STR_TO_DATE('2019-02-23','%Y-%m-%d') query scans the entire table (≈109 million rows) and takes >260 s. Adding idx_receive_time allows the delete to use an index range scan, reducing scanned rows dramatically.

Testing Procedure

The table was copied to a test instance, backed up with mydumper (parallel compression, 1.2 GB dump from a 48 GB table), and restored with myloader (≈126 min). Both online DDL and pt‑osc were used to drop the old index and add the new ones.

Online DDL completed in ~34 min, while pt‑osc took ~57 min, making online DDL roughly twice as fast for this workload.

Post‑Optimization Results

After adding idx_receive_time , the same SELECT queries now scan only a few thousand rows (e.g., 291 k → 291 k rows, filtered 16.66 %). The DELETE still takes ~77 s because it removes millions of rows, but the index‑driven range scan limits the work.

To further reduce impact, a batch‑delete approach was implemented: repeatedly delete 20 000 rows using the primary key range until the target timestamp is cleared, sleeping briefly between batches.

Conclusion

When dealing with very large tables, it is essential to monitor not only query latency but also maintenance costs such as DDL duration and bulk deletes. Choosing the right index strategy and using online DDL tools can dramatically shorten schema‑change windows, while batch‑delete techniques mitigate replication lag and SLA violations.

Performance TuningMySQLIndex OptimizationLarge TablesDDLBatch Delete
Selected Java Interview Questions
Written by

Selected Java Interview Questions

A professional Java tech channel sharing common knowledge to help developers fill gaps. Follow us!

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.