Databases 15 min read

Elasticsearch Deployment Best Practices: Memory, CPU, Sharding, Replicas, Hot/Warm Architecture, Node Roles, Monitoring and Troubleshooting

This article presents practical best‑practice guidelines for configuring Elasticsearch in production, covering heap memory sizing, CPU considerations, shard and replica planning, hot‑warm node architecture, node role settings, common pitfalls, monitoring APIs, and troubleshooting tips.

Sohu Tech Products
Sohu Tech Products
Sohu Tech Products
Elasticsearch Deployment Best Practices: Memory, CPU, Sharding, Replicas, Hot/Warm Architecture, Node Roles, Monitoring and Troubleshooting

1. Memory

Elasticsearch and Lucene run on the JVM, so heap size must be carefully set. Larger heap gives more cache for filters and improves query performance, but exceeding 32 GB can cause long garbage‑collection pauses. The recommendation is to allocate up to 50 % of available RAM (max 31 GB) to the heap and leave the rest for OS file cache.

Typical mis‑configuration: setting the heap equal to total host memory, which starves the OS cache used by Lucene for immutable segment files (inverted index, doc values). This degrades performance.

Configure the heap via # Xms and Xmx in jvm.options or startup parameters, e.g.:

# Xms represents the initial size of total heap space
# Xmx represents the maximum size of total heap space
-Xms16g
-Xmx16g

Or:

ES_JAVA_OPTS="-Xms10g -Xmx10g" ./bin/elasticsearch

2. CPU

Complex queries and heavy writes consume significant CPU; choose appropriate query types and write strategies. Elasticsearch uses multiple thread pools per node; dynamic allocation is usually sufficient, so manual tuning of pool sizes is not recommended.

3. Shard Count

Shards are the unit of data distribution. Too many small shards increase coordination overhead, while too few large shards can limit parallelism. Recommended shard size is 30‑50 GB. Changing the primary shard count requires re‑indexing.

3.1 Many Small Shards vs Few Large Shards

Many small shards can give fast per‑shard response but may increase overall query latency due to result merging.

High concurrency with many small shards can reduce throughput.

3.2 Shard Number Guidelines

Allocate sufficient resources to master nodes when shard count is high. Primary shard count is defined at index creation and can only be changed by recreating the index and reindexing data.

4. Replicas

Replicas provide high availability and can improve query performance by serving reads. The default replica count is 1; increase only if required, as more replicas double storage needs and may not yield noticeable performance gains.

5. Hot‑Warm Architecture

Separate hot (frequently accessed) and warm/cold (infrequently accessed) data onto different node types. Hot nodes use SSDs and high‑end hardware; warm/cold nodes can use spinning disks. Use Curator or ILM to move indices between tiers.

5.1 Hot Nodes

Store recent, hot data; recommend at least three hot nodes for HA.

5.2 Warm/Cold Nodes

Store older, read‑only data; also recommend at least three nodes for HA.

6. Node Role Assignment

Elasticsearch nodes can be master, data, or coordinating. Master nodes manage cluster state; data nodes store and search data; coordinating nodes route requests.

Configuration examples (elasticsearch.yml):

node.master: true
node.data: false
node.master: false
node.data: true
node.master: false
node.data: false

7. Troubleshooting Tips

Monitor host resources: CPU, memory, disk I/O. High heap usage (>75 %) leads to increased GC pauses; near 100 % can cause aggressive GC and severe latency.

Non‑heap memory growth can exhaust page cache and cause OOM. Monitor disk I/O and adjust shard size, merge policy, or add SSDs/nodes.

Configure sensible alerts for search latency, enable slow‑query logging, and set appropriate refresh intervals.

Example settings to limit expensive queries (ES 7.7+):

PUT _cluster/settings
{
  "transient": {
    "search.default_search_timeout": "50s",
    "search.allow_expensive_queries": false
  }
}

Disable wildcard deletions:

PUT /_cluster/settings
{
  "persistent": {
    "action.destructive_requires_name": true
  }
}

8. Common Monitoring APIs

Cluster health: GET _cluster/health?pretty

Index list: GET _cat/indices?pretty&v

Node stats: GET _nodes?pretty

Master node: GET _cat/master?pretty&v

Shard allocation and stats: GET _stats?pretty

Node JVM/IO stats: GET _nodes/stats?pretty

9. Summary

Elasticsearch ships with sensible defaults for beginners, but production deployments require careful tuning of heap, CPU, shard count, replica settings, node roles, and monitoring to meet performance and reliability goals.

Follow the recommendations above and consult the official documentation for optimal configuration.

monitoringmemory managementElasticsearchShardingCluster Tuningnode roles
Sohu Tech Products
Written by

Sohu Tech Products

A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.