Authoritative Guide to Elasticsearch Search Performance Tuning (Part 1)
This guide explains how to tune Elasticsearch 5.x search performance by modeling documents with flattened, nested, or parent‑child structures, managing global ordinals, allocating JVM heap under 50 % of RAM, monitoring garbage collection, and choosing SSD storage and ample file‑system cache.
Elasticsearch 5.x introduced many improvements and is part of the Elastic Stack. This article is the first part of an authoritative guide to search performance tuning, covering document modeling, memory allocation, file‑system cache, garbage collection, hardware, and more.
Document modeling – Elasticsearch does not have a native “object” type; object fields are flattened. Example indexing request shown, and the resulting flattened document.
curl -XPUT 'localhost:9200/my_index/my_type/1?pretty' -H 'Content-Type: application/json' -d '{ "group" : "fans", "user" : [ { "first" : "John", "last" : "Smith" }, { "first" : "Alice", "last" : "White" } ] }' { "group" : "fans", "user.first" : [ "alice", "john" ], "user.last" : [ "smith", "white" ] }When you need to preserve relationships between objects in an array, use the nested type. A nested mapping and example queries are provided.
curl -XPUT 'ES_HOST:ES_PORT/my_index?pretty' -H 'Content-Type: application/json' -d '{ "mappings": { "my_type": { "properties": { "user": { "type": "nested" } } } }'
curl -XPUT 'ES_HOST:ES_PORT/my_index/my_type/1?pretty' -H 'Content-Type: application/json' -d '{ "group" : "fans", "user" : [ { "first" : "John", "last" : "Smith" }, { "first" : "Alice", "last" : "White" } ] }'
curl -XGET 'ES_HOST:ES_PORT/my_index/_search?pretty' -H 'Content-Type: application/json' -d '{ "query": { "nested": { "path": "user", "query": { "bool": { "must": [ { "match": { "user.first": "Alice" }}, { "match": { "user.last": "Smith" }} ] } } } } }'The parent‑child relationship can also be used to model loosely coupled entities. Example mapping and queries illustrate how to create parent and child documents and perform has_child queries.
curl -XPUT 'ES_HOST:ES_PORT/my_index?pretty' -H 'Content-Type: application/json' -d '{ "mappings": { "my_parent": {}, "my_child": { "_parent": { "type": "my_parent" } } } }'
curl -XPUT 'ES_HOST:ES_PORT/my_index/my_parent/1?pretty' -H 'Content-Type: application/json' -d '{ "text": "This is a parent document" }'
curl -XPUT 'ES_HOST:ES_PORT/my_index/my_child/2?parent=1&pretty' -H 'Content-Type: application/json' -d '{ "text": "This is a child document" }'
curl -XGET 'ES_HOST:ES_PORT/my_index/my_parent/_search?pretty' -H 'Content-Type: application/json' -d '{ "query": { "has_child": { "type": "my_child", "query": { "match": { "text": "child document" } } } } }'Global ordinals and delayed building – Parent‑child joins rely on global ordinals. By default they are built lazily on the first query after a refresh. Setting eager_global_ordinals in the mapping forces eager construction.
curl -XPUT 'ES_HOST:ES_PORT/company -d '{ "mappings": { "branch": {}, "employee": { "_parent": { "type": "branch", "fielddata": { "loading": "eager_global_ordinals" } } } } }'Increasing refresh_interval can reduce the CPU cost of rebuilding global ordinals when many parent documents exist.
JVM heap sizing – Elasticsearch runs on the JVM; the heap should be less than 50 % of the available RAM and not exceed 32 GB. Example of setting the heap via environment variable or command‑line.
export ES_HEAP_SIZE=10g ES_HEAP_SIZE="10g" ./bin/elasticsearchVerify the setting with curl -XGET http://ES_HOST:9200/_cat/nodes?h=heap.max .
Garbage collection – Monitor GC frequency and pause times; an oversized heap can increase GC pauses, potentially causing node disconnections.
Hardware considerations – Use SSDs instead of HDDs, avoid remote file systems (NFS, SMB, EBS) for index storage, and allocate sufficient RAM for the file‑system cache (at least half of the memory).
The article concludes with a link to the original English blog post and a note that the content is copyrighted.
vivo Internet Technology
Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.