Backend Development 10 min read

Improving Elasticsearch Query Performance for Billion‑Scale Datasets

To boost Elasticsearch query speed on billions of records, allocate sufficient filesystem cache memory, store only searchable fields, separate hot and cold data, warm up cache, avoid complex joins, and replace deep pagination with Scroll API or search_after for millisecond‑level responses.

Architecture Digest

May 28, 2019

Improving Elasticsearch Query Performance for Billion‑Scale Datasets

When interviewing, a common question is how to improve Elasticsearch query efficiency on billions of records.

Elasticsearch performance is limited; large data sets can cause 5‑10 seconds latency on first queries, which improves after caching.

The key optimization is to maximize the OS filesystem cache: allocate enough memory so that index segment files fit into cache, aiming for cache size at least half of total data.

Store only searchable fields in Elasticsearch and keep the rest in a secondary store such as HBase; this reduces index size and improves cache hit rate.

Implement data warm‑up by periodically querying hot data so it stays in the filesystem cache, and separate hot and cold data into different indices to prevent cold data from evicting hot data.

Design documents to avoid complex joins; perform joins in the application layer before indexing.

Deep pagination is costly; avoid it by limiting page depth, using the Scroll API or search_after for efficient forward‑only pagination.

These practices together can reduce query times from several seconds to tens of milliseconds even with terabytes of data.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Elasticsearch data modeling pagination search optimization Filesystem Cache

Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.