Elasticsearch Pagination Techniques: from+size, scroll, and search_after
This article explains three Elasticsearch pagination methods—shallow pagination using from+size, deep pagination with scroll, and cursor‑based pagination with search_after—detailing their principles, usage examples, performance considerations, and how to manage scroll contexts.
1. Shallow Pagination with from + size
Elasticsearch’s simple pagination retrieves the first 20 documents, discards the first 10, and returns documents 11‑20, which wastes the initial query. The from parameter defines the offset, while size defines the number of results returned (default from=0 , size=10 ).
GET test_dev/_search
{
"query": {
"bool": {
"filter": [
{ "term": { "age": 28 } }
]
}
},
"size": 10,
"from": 20,
"sort": [
{ "timestamp": { "order": "desc" } },
{ "_id": { "order": "desc" } }
]
}Because Elasticsearch is shard‑based, a request with from=100 and size=10 pulls 100 documents from each shard, merges them, and then returns the last 10, which becomes slower as from grows.
2. Deep Pagination with scroll
When the result set exceeds 10,000 documents, shallow pagination becomes inefficient. The scroll API works like a SQL cursor: each request returns a page and a scroll_id that must be used for the next page. Jump‑page navigation is not supported.
GET test_dev/_search?scroll=5m
{
"query": { "bool": { "filter": [ { "term": { "age": 28 } } ] } },
"size": 10,
"from": 0,
"sort": [ { "timestamp": { "order": "desc" } }, { "_id": { "order": "desc" } } ]
}scroll=5m keeps the scroll_id valid for 5 minutes.
When using scroll, from must be set to 0.
size determines how many documents each _search call returns.
To fetch the next page, send the returned _scroll_id to the scroll endpoint:
GET _search/scroll
{
"scroll_id": "DnF1ZXJ5VGhlbkZldGNoBQAAAAAAAJZ9Fnk1d......",
"scroll": "5m"
}Delete a scroll when it is no longer needed to free resources:
DELETE _search/scroll/DnF1ZXJ5VGhlbkZldGNo.....Delete all scrolls:
DELETE _search/scroll/_all3. Deep Pagination with search_after
The search_after method uses the sort values of the last hit from the previous page as a cursor, allowing real‑time reflection of index changes. It requires a globally unique field (e.g., _uid or a business‑level ID) and does not support jump‑page requests.
GET test_dev/_search
{
"query": { "bool": { "filter": [ { "term": { "age": 28 } } ] } },
"size": 20,
"from": 0,
"sort": [ { "timestamp": { "order": "desc" } }, { "_id": { "order": "desc" } } ]
}search_after must be used with from=0 .
Use timestamp and _id as the unique sort values.
Pass the last hit’s sort array to search_after for the next page.
GET test_dev/_search
{
"size": 10,
"from": 0,
"search_after": [1541495312521, "d0xH6GYBBtbwbQSP0j1A"],
"sort": [ { "timestamp": { "order": "desc" } }, { "_id": { "order": "desc" } } ]
}These three techniques allow developers to choose the most suitable pagination strategy based on data volume, performance requirements, and whether real‑time consistency is needed.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.