Backend Development 17 min read

Elasticsearch Pagination: From/Size, Deep Paging Issues, Scroll, Search After, PIT and Best Practices

This article explains Elasticsearch pagination mechanisms—including from/size, deep paging drawbacks, scroll, scroll‑scan, sliced scroll, search_after and point‑in‑time—detailing their inner workings, performance trade‑offs, configuration limits, and practical recommendations for handling large result sets.

Sohu Tech Products
Sohu Tech Products
Sohu Tech Products
Elasticsearch Pagination: From/Size, Deep Paging Issues, Scroll, Search After, PIT and Best Practices

Introduction

Elasticsearch is a real‑time distributed search and analytics engine. Similar to relational databases, deep pagination should be avoided, and this article focuses on pagination techniques in Elasticsearch.

From/Size Parameters

By default a search returns the top 10 hits. Pagination can be performed using the from and size parameters.

from defines the number of hits to skip (default 0).

size defines the maximum number of hits to return.

Example query:

POST /my_index/my_type/_search
{
    "query": { "match_all": {} },
    "from": 100,
    "size": 10
}

This request returns 10 documents starting from the 101st hit.

Query and Fetch Phases

Elasticsearch executes a search in two stages:

Query phase – determines which documents match.

Fetch phase – retrieves the actual document source.

During the query phase the coordinating node creates a priority queue of size from + size , broadcasts the request to shards, each shard builds its own queue, and the coordinating node merges them to produce the final top‑N list.

Deep Pagination Problems

When from is large, every shard must return from + size hits, causing high CPU, memory, I/O and network usage. Sorting cost grows exponentially with depth. The index.max_result_window (default 10 000) limits size ; it can be increased if needed:

PUT _settings
{
    "index": { "max_result_window": "10000000" }
}

Official Deep‑Paging Solutions

Scroll

Scroll works like a cursor in relational databases and is suited for batch processing (e.g., mass messaging). It creates a snapshot of the index at the start of the scroll, returns a _scroll_id , and subsequent requests use that ID to fetch the next batch.

POST /twitter/tweet/_search?scroll=1m
{
    "size": 100,
    "query": { "match": { "title": "elasticsearch" } }
}

Subsequent fetch:

POST /_search?scroll=1m
{ "scroll_id": "
" }

Drawbacks: consumes resources for the snapshot and _scroll_id , and does not reflect real‑time changes.

Scroll Scan

Scroll Scan disables sorting for higher performance. It requires search_type=scan and the size parameter is per‑shard.

POST /my_index/my_type/_search?search_type=scan&scroll=1m&size=50
{ "query": { "match_all": {} } }

Sliced Scroll

Sliced scroll splits a scroll request into multiple parallel slices, speeding up large data extraction.

POST /index/type/_search?scroll=1m
{
    "query": { "match_all": {} },
    "slice": { "id": 0, "max": 5 }
}
POST /index/type/_search?scroll=1m
{
    "query": { "match_all": {} },
    "slice": { "id": 1, "max": 5 }
}

Search After

Introduced in ES 5, search_after provides a stateless cursor using the sort values of the last hit of the previous page.

POST twitter/_search
{
    "size": 10,
    "query": { "match": { "title": "es" } },
    "sort": [ { "date": "asc" }, { "_id": "desc" } ]
}

Use the returned sort array for the next request:

GET twitter/_search
{
    "size": 10,
    "query": { "match": { "title": "es" } },
    "search_after": [124648691, "624812"],
    "sort": [ { "date": "asc" }, { "_id": "desc" } ]
}

Advantages: no snapshot, real‑time data, high performance. Disadvantages: requires a unique sort field, not suitable for large jumps.

Point‑In‑Time (PIT) with Search After (ES 7+)

From ES 7 onward, using PIT with search_after is recommended for deep pagination.

POST /my-index-000001/_pit?keep_alive=1m

Then include the PIT ID in the search request.

Performance Comparison

Pagination Method

Performance

Pros

Cons

Use Case

from + size

Low

Simple, flexible

Deep‑paging cost

Small data sets (<10k)

scroll

Medium

Solves deep‑paging, good for bulk export

Snapshot overhead, scroll_id management

Mass data export

search_after

High

Best performance, reflects real‑time changes

Complex implementation, needs unique sort field, not for large jumps

Large‑scale real‑time pagination

Forward Paging

Elasticsearch has no native forward‑paging API; it can be simulated by reversing the sort order and using search_after on the first hit of the current page.

Conclusion

If the total result window is under 10 000 or only top‑N results are needed, use from/size .

For large data sets and batch jobs, use scroll (or scroll‑scan ).

For large data sets with real‑time, high‑concurrency queries, prefer search_after (optionally with PIT in ES 7+).

Personal Thoughts

Both scroll and search_after rely on cursor‑like mechanisms to avoid deep‑paging costs, but they are compromises: scroll requires maintaining a snapshot and scroll_id , while search_after cannot jump arbitrarily and may produce inconsistent results if the index changes between pages.

BackendElasticsearchpaginationSearchscrollDeep PagingSearch After
Sohu Tech Products
Written by

Sohu Tech Products

A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.