Elasticsearch Pagination: From/Size, Deep Paging Issues, and Alternative Methods (Scroll, Search After, PIT)
This article explains how Elasticsearch pagination works with from/size, why deep paging can cause performance problems, and compares alternative techniques such as Scroll, Scroll‑Scan, Sliced Scroll, Search After, and point‑in‑time (PIT) searches for handling large result sets efficiently.
Elasticsearch is a real‑time distributed search and analytics engine. This article introduces pagination in Elasticsearch, focusing on the default from and size parameters and the drawbacks of deep paging.
From/Size Parameters
The default query returns the top 10 hits. To paginate, you specify from (number of hits to skip) and size (maximum number of hits to return). Example request:
POST /my_index/my_type/_search
{
"query": { "match_all": {} },
"from": 100,
"size": 10
}This returns 10 documents starting from the 101st hit.
How the Query Is Executed
Elasticsearch performs a query phase to determine which documents match, then a fetch phase to retrieve the actual document data. The coordinating node creates a priority queue of size from + size and merges results from all shards.
Deep Paging Problems
When from is large, each shard must return from + size hits, causing exponential cost in CPU, memory, I/O, and network. The index.max_result_window defaults to 10 000; exceeding it requires raising this setting.
PUT _settings
{
"index": { "max_result_window": "10000000" }
}Deep paging also leads to large data transfers of only _id and _score values.
Alternative Pagination Methods
Scroll
Scroll creates a snapshot of the index and is suited for batch processing large data sets, not real‑time queries. Initialization returns a _scroll_id which is used for subsequent fetches.
POST /twitter/tweet/_search?scroll=1m
{
"size": 100,
"query": { "match": { "title": "elasticsearch" } }
}Subsequent calls use the returned _scroll_id to retrieve the next batch.
Scroll‑Scan
Scroll‑Scan adds search_type=scan to avoid sorting, improving performance when ordering is not required. The size parameter now controls the number of hits per shard.
POST /my_index/my_type/_search?search_type=scan&scroll=1m&size=50
{
"query": { "match_all": {} }
}Sliced Scroll
Sliced scroll splits a scroll request into multiple parallel slices, each identified by an id and a total max number of slices.
POST /index/type/_search?scroll=1m
{
"query": { "match_all": {} },
"slice": { "id": 0, "max": 5 }
}Search After
Introduced in ES 5, search_after uses the sort values of the last hit from the previous page to fetch the next page, avoiding deep paging.
POST /twitter/_search
{
"size": 10,
"query": { "match": { "title": "es" } },
"sort": [ { "date": "asc" }, { "_id": "desc" } ]
}After obtaining the last hit’s sort array, the next request includes it:
GET /twitter/_search
{
"size": 10,
"query": { "match": { "title": "es" } },
"search_after": [124648691, "624812"],
"sort": [ { "date": "asc" }, { "_id": "desc" } ]
}Point‑in‑Time (PIT) with Search After
From ES 7, a PIT ID can be created to keep the index state stable across multiple scroll or search‑after requests.
POST /my-index-000001/_pit?keep_alive=1mThe PIT ID is then supplied in the search request:
GET /_search
{
"size": 10000,
"query": { "match": { "user.id": "elkbee" } },
"pit": { "id": "
", "keep_alive": "1m" },
"sort": [ { "@timestamp": { "order": "asc", "format": "strict_date_optional_time_nanos", "numeric_type": "date_nanos" } } ]
}Performance Comparison
From/size works well for small result windows (<10 000). Scroll solves deep paging but incurs snapshot overhead. Search After offers the best performance for large, real‑time pagination but requires a globally unique sort field.
Conclusion
If the data set is small (within 10 000 hits) and only top‑N results are needed, use from/size .
For massive data sets and batch processing, use scroll (or scroll‑scan when sorting is unnecessary).
For large data sets with real‑time, high‑concurrency queries, prefer search_after (optionally with PIT).
Both Scroll and Search After rely on cursor‑like mechanisms to avoid the cost of deep paging, but they are not a complete cure; deep paging should be avoided whenever possible.
Code Ape Tech Column
Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.