Understanding MySQL Pagination Performance and Index Optimization
The article explores why MySQL pagination with large offsets is slow, explains index structures such as clustered and secondary indexes, describes logical query operators, and presents two practical optimization strategies to improve query performance.
Five years ago the author noticed that a MySQL query with pagination (e.g., SELECT * FROM table WHERE status = xx LIMIT 10 OFFSET 10000 ) took several seconds even on a dataset of only 100,000 rows, prompting an investigation into the time complexity of retrieving the nth largest value using indexes.
The analysis revealed that using an index to locate rows is inefficient for large offsets because the engine cannot know which rows will be discarded, leading to many random I/O operations. MySQL’s B+‑tree index stores leaf nodes in a linked list, allowing O(N) traversal, but this still incurs significant cost when millions of rows are skipped.
Two key concepts were highlighted:
Clustered index: contains the primary key and the actual row data; leaf nodes are data nodes.
Secondary (auxiliary) index: a non‑primary index whose leaf nodes store the primary key values.
When a query like SELECT * FROM table WHERE status = xx LIMIT 10 OFFSET 10000 is executed, MySQL first scans the secondary index, then fetches each matching primary key from the clustered index, resulting in thousands of random I/O operations.
To address this, the article introduces logical query operators (DataSource, Selection, Projection, Join) and shows how a query is transformed into a logical plan, for example:
SELECT b FROM t1, t2 WHERE t1.c = t2.c AND t1.a > 5 becomes DataSource → Join → Selection → Projection.
Two optimization solutions are proposed:
Solution 1
Replace traditional offset‑based pagination with keyset pagination, using the last retrieved index value (e.g., primary key) as the starting point for the next page, which eliminates the need for large offsets.
Solution 2
Leverage index covering: first retrieve only the primary keys from a secondary index with the desired limit/offset, then fetch the full rows from the clustered index using those keys, reducing random I/O to the number of rows returned (e.g., ten).
Example query for index covering:
SELECT xxx, xxx FROM (SELECT id FROM table WHERE second_index = xxx LIMIT 10 OFFSET 10000) AS sub
This approach dramatically improves performance for pagination scenarios that truly require offset handling.
Code Ape Tech Column
Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.