Elasticsearch Index Performance Optimization (Part 2)
To maximize Elasticsearch bulk-indexing speed, temporarily disable refreshes and replicas, tune merge throttling and scheduler threads, enlarge translog and index buffer thresholds, and adjust indexing and bulk thread-pool sizes, then restore defaults after the load completes.
This article, translated from the QBox official blog, is the second part of a three‑part series on maximizing Elasticsearch indexing performance. It focuses on configuration settings that improve indexing throughput and reduce management overhead.
Refresh Interval
The index.refresh_interval setting controls how often Elasticsearch refreshes a shard so that newly indexed documents become searchable. The default is 1s . Increasing this interval (or setting it to -1 to disable refresh temporarily) reduces the costly refresh operation and can dramatically boost bulk indexing speed. Example:
curl -XPUT 'localhost:9200/test/_settings' -d '{ "index" : { "refresh_interval" : "-1" } }'
After bulk indexing, the setting should be restored, e.g.:
curl -XPUT 'localhost:9200/my_index/_settings' -d '{ "index" : { "refresh_interval" : "1s" } }'
Replica Count
During large bulk imports, setting index.number_of_replicas to 0 avoids the overhead of indexing documents on replica shards. Once the import finishes, replicas can be re‑enabled.
curl -XPUT 'localhost:9200/my_index/_settings' -d '{ "index" : { "number_of_replicas" : 0 } }'
Segment Merging and Throttling
Elasticsearch merges Lucene segments in the background, which is I/O‑intensive. The default merge throttling rate is 20 MB/s for HDDs; for SSDs a higher limit (100–200 MB/s) is advisable. Throttling can be disabled entirely with:
curl -XPUT 'localhost:9200/_cluster/settings' -d '{ "transient" : { "indices.store.throttle.type" : "none" } }'
After the bulk operation, restore the setting to merge .
curl -XPUT 'localhost:9200/_cluster/settings' -d '{ "transient" : { "indices.store.throttle.type" : "merge" } }'
Merge Scheduler Thread Count
The index.merge.scheduler.max_thread_count controls how many threads may run merges concurrently. The default is max(1, min(4, availableProcessors/2)) . For HDD‑based nodes it is often set to 1 (allowing three threads total). Example for a single index:
curl -XPUT 'localhost:9200/my_index/_settings' -d '{ "index.merge.scheduler.max_thread_count" : 1 }'
For all indices:
curl -XPUT 'localhost:9200/_settings' -d '{ "index.merge.scheduler.max_thread_count" : 1 }'
Translog (Transaction Log) Management
Flushing the translog forces a Lucene commit, which is expensive. Settings that influence flush behavior include:
index.translog.flush_threshold_size (default 512 MB)
index.translog.flush_threshold_ops (default unlimited)
index.translog.flush_threshold_period (default 30 min)
index.translog.interval (default 5 s)
Increasing flush_threshold_size (e.g., to 1 GB) reduces the frequency of flushes and can improve indexing throughput, provided sufficient heap memory is available.
Index Buffer Size
The index buffer stores newly indexed documents before they are written to disk. Its size is controlled by static settings that must be configured on every data node:
indices.memory.index_buffer_size – default 10 % of JVM heap
indices.memory.min_index_buffer_size – default 48 MB
indices.memory.max_index_buffer_size – unlimited by default
Increasing the buffer size can be beneficial for heavy indexing workloads.
Thread Pools for Indexing and Bulk Operations
Elasticsearch provides dedicated thread pools:
index – fixed size, default = number of CPU cores, queue size 200
bulk – fixed size, default = number of CPU cores, queue size 50
Adjusting these pools (and the index.index_concurrency setting that limits concurrent indexing per shard) can further improve performance, especially on nodes dedicated to a single shard.
vivo Internet Technology
Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.