Optimizing Pagination Queries for Billion‑Row MySQL Tables
The article analyzes the performance problems of deep pagination on massive MySQL tables storing billions of fan records and presents three progressive solutions—simple LIMIT, tag‑record (maxId) pagination, and range‑limited pagination with async and offline minId caching—along with general indexing best‑practices for high‑throughput queries.
In a system that stores user‑follow relationships for a large e‑commerce platform, fan lists for popular items can reach tens of millions or even billions of rows, spread across 16 shards and 256 tables, each holding over 200 million records.
Problem description : Using the naive LIMIT offset, n approach leads to severe performance degradation as the offset grows, because MySQL must scan offset + n rows before discarding the first offset rows.
Solution 1 – Simple LIMIT : Works for small fan counts; easy to implement but performance drops sharply for deep pages.
Solution 2 – Tag‑Record Method : Store the primary‑key id of the last row returned (maxId) and on the next request query with WHERE id < #{lastId} and a small LIMIT . This avoids scanning large offsets and keeps query time in the tens of milliseconds for tens of millions of rows, though it cannot jump to arbitrary pages.
select id,biz_content,pin FROM follow_fans_1 where biz_content = #{bizContent} and id < #{lastId} order by id desc limit 10;
Solution 3 – Range‑Limit Method : Add a lower bound minId to the query to guarantee termination on the last page: WHERE id < #{lastId} AND id >= #{minId} . The minId is obtained asynchronously (cached) or via an offline batch job that computes the minimum id per business object.
select id,biz_content,pin FROM follow_fans_1 where biz_content = #{bizContent} and id < #{lastId} and id >= #{minId} order by id desc limit 10;
When the table size reaches billions, the async SELECT MIN(id) can still time‑out, so an offline data‑processing pipeline periodically calculates and caches minId , reducing load on the transactional database.
select min(id) from follow_fans_1 where biz_content = #{bizContent}
General SQL Optimization Recommendations :
Ensure query conditions are indexed; understand clustered vs. secondary indexes in InnoDB.
Follow the left‑most prefix rule and avoid functions on indexed columns.
Prefer high‑cardinality columns for indexes and extend existing indexes rather than creating many new ones.
Use covering indexes to avoid back‑table lookups; verify with EXPLAIN that Extra shows Using index .
By selecting the appropriate pagination strategy based on data volume and by applying proper indexing techniques, the system can maintain sub‑100 ms response times even when fan tables contain billions of rows.
JD Retail Technology
Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.