Optimizing Elasticsearch Search Performance with Index Sorting
By defining index sorting on the publish_time field when creating the Elasticsearch index, the team transformed a multi‑second full‑scan query into a sub‑50 ms operation, demonstrating that pre‑ordered storage dramatically speeds up large‑result‑set sorts while modestly affecting write throughput.
Background: Since 2020, the community's content annotation service stores a secondary index in Elasticsearch to support complex backend searches. The index grew to billions of documents, causing search response times to increase.
Initial query: a simple request to fetch the latest 10 documents sorted by publish_time, which resulted in scanning the entire index and high latency.
GET /content-alias/_search
{
"track_total_hits": true,
"sort": [
{ "publish_time": { "order": "desc" } }
],
"size": 10
}First mitigation: added a time range filter to limit the result set, reducing latency to ~200 ms, but sorting large result sets remained slow.
Investigation of Elasticsearch internals revealed that sorting relies on Lucene's inverted index, doc values, and segment structure. DocValues provide column‑oriented storage for fast sorting, but when the result set is huge the sorting cost is still high.
Solution: enable Index Sorting at index creation so that documents are stored on disk already ordered by the sort field. This allows Elasticsearch to perform an early‑termination sort, returning the top N hits without a full resort.
PUT /content
{
"settings": {
"index": {
"sort.field": "publish_time",
"sort.order": "desc"
}
},
"mappings": {
"properties": {
"content_id": { "type": "long" },
"publish_time": { "type": "long" }
// ...
}
}
}After enabling Index Sorting, query latency dropped from ~2000 ms to ~50 ms for the same request, and slow‑query incidents disappeared. Benchmarks also showed a modest impact on write throughput.
Additional recommendations: avoid requesting total hit count when not needed, use custom routing to limit shard scans, store low‑cardinality fields as keyword, and preprocess data with ingest pipelines.
Conclusion: For scenarios with large result sets and sorting on a single low‑cardinality field, Index Sorting is an effective optimization, though it increases write cost and must be defined at index creation.
DeWu Technology
A platform for sharing and discussing tech knowledge, guiding you toward the cloud of technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.