Databases 12 min read

Optimizing Elasticsearch Search Performance with Index Sorting

By defining index sorting on the publish_time field when creating the Elasticsearch index, the team transformed a multi‑second full‑scan query into a sub‑50 ms operation, demonstrating that pre‑ordered storage dramatically speeds up large‑result‑set sorts while modestly affecting write throughput.

DeWu Technology

May 8, 2023

Optimizing Elasticsearch Search Performance with Index Sorting

Background: Since 2020, the community's content annotation service stores a secondary index in Elasticsearch to support complex backend searches. The index grew to billions of documents, causing search response times to increase.

Initial query: a simple request to fetch the latest 10 documents sorted by publish_time, which resulted in scanning the entire index and high latency.

GET /content-alias/_search
{
  "track_total_hits": true,
  "sort": [
    { "publish_time": { "order": "desc" } }
  ],
  "size": 10
}

First mitigation: added a time range filter to limit the result set, reducing latency to ~200 ms, but sorting large result sets remained slow.

Investigation of Elasticsearch internals revealed that sorting relies on Lucene's inverted index, doc values, and segment structure. DocValues provide column‑oriented storage for fast sorting, but when the result set is huge the sorting cost is still high.

Solution: enable Index Sorting at index creation so that documents are stored on disk already ordered by the sort field. This allows Elasticsearch to perform an early‑termination sort, returning the top N hits without a full resort.

PUT /content
{
  "settings": {
    "index": {
      "sort.field": "publish_time",
      "sort.order": "desc"
    }
  },
  "mappings": {
    "properties": {
      "content_id": { "type": "long" },
      "publish_time": { "type": "long" }
      // ...
    }
  }
}

After enabling Index Sorting, query latency dropped from ~2000 ms to ~50 ms for the same request, and slow‑query incidents disappeared. Benchmarks also showed a modest impact on write throughput.

Additional recommendations: avoid requesting total hit count when not needed, use custom routing to limit shard scans, store low‑cardinality fields as keyword, and preprocess data with ingest pipelines.

Conclusion: For scenarios with large result sets and sorting on a single low‑cardinality field, Index Sorting is an effective optimization, though it increases write cost and must be defined at index creation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Elasticsearch DocValues Index Sorting lucene search performance

Written by

DeWu Technology

A platform for sharing and discussing tech knowledge, guiding you toward the cloud of technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.