Databases 15 min read

Deep Dive into Elasticsearch Pagination: from/size, Scroll, and Search After

This article explains how Elasticsearch handles deep pagination, compares the traditional from/size method with Scroll and Search After techniques, details their internal query and fetch phases, provides practical code examples, and offers guidance on choosing the right approach for large‑scale search workloads.

Top Architect
Top Architect
Top Architect
Deep Dive into Elasticsearch Pagination: from/size, Scroll, and Search After

Elasticsearch is a real‑time distributed search and analytics engine widely used for storing and retrieving massive amounts of unstructured data. While it excels at fast search, it suffers from the same deep‑pagination problem as relational databases when the from + size values become large.

1. from/size Pagination

The simplest pagination uses the from (starting offset) and size (page length) parameters, similar to SQL LIMIT . A sample request looks like:

GET /wms_order_sku/_search
{
  "query": { "match_all": {} },
  "from": 10,
  "size": 20
}

During the query phase, the coordinating node creates a priority queue of from + size entries, each shard returns its own queue, and the coordinator merges them to produce a global queue. The fetch phase then retrieves the actual documents for the selected IDs.

2. Scroll Pagination

Scroll works like a database cursor. The first request includes a scroll time‑to‑live (e.g., 1m ) and returns a _scroll_id . Subsequent requests send the same _scroll_id to continue reading from the snapshot, avoiding repeated sorting and reducing memory pressure.

GET /wms_order_sku2021_10/_search?scroll=1m
{
  "query": { "bool": { "must": [ { "range": { "shipmentOrderCreateTime": { "gte": "2021-10-04 00:00:00", "lt": "2021-10-15 00:00:00" } } } } ] },
  "size": 20
}

The scroll response is then used with:

GET /_search/scroll
{
  "scroll": "1m",
  "scroll_id": "DnF1ZXJ5VGhlbkZldGNo..."
}

Scroll is ideal for batch export or migration tasks where real‑time consistency is not required.

3. Search After Pagination

Search After, introduced in ES 5, records the sort values of the last hit (e.g., _id and a timestamp) and uses them as a cursor for the next page. This method provides real‑time results because it does not rely on a snapshot.

GET /wms_order_sku2021_10/_search
{
  "query": { "bool": { "must": [ { "range": { "shipmentOrderCreateTime": { "gte": "2021-10-12 00:00:00", "lt": "2021-10-15 00:00:00" } } } } ] },
  "size": 20,
  "sort": [ { "_id": { "order": "desc" } }, { "shipmentOrderCreateTime": { "order": "desc" } } ],
  "search_after": ["SO-460_152-1447931043809128448-100017918838", 1634077436000]
}

Search After requires a globally unique sort field (often _id ) and is stateless, making it suitable for high‑concurrency user‑facing pagination.

4. Comparison and Recommendations

If the total result window stays below 10 000 (the default index.max_result_window ), from/size is simple and fast. For large result sets or batch processing, use Scroll . When real‑time consistency and high concurrency are needed, prefer Search After . Adjusting max_result_window is discouraged because deep pagination incurs heavy sorting and memory costs.

Overall, avoid deep pagination in Elasticsearch whenever possible; instead, limit queries to the top‑N results or export data for offline analysis.

Backendbig dataElasticsearchpaginationSearchscrollSearch After
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.