Operations 19 min read

Elasticsearch Cluster Capacity Planning, Index Configuration, and Performance Optimization

This guide outlines practical capacity‑planning, index‑design, and write‑performance tuning for Tencent Cloud Elasticsearch clusters, covering compute and storage sizing, optimal shard counts, rollover strategies, bulk API settings, health monitoring, and common troubleshooting steps to ensure stable, high‑throughput search services.

Tencent Cloud Developer

Sep 27, 2020

Elasticsearch Cluster Capacity Planning, Index Configuration, and Performance Optimization

With the rapid growth of Tencent Cloud Elasticsearch (ES) services, many users encounter stability and availability issues caused by insufficient early‑stage cluster planning. This article shares practical experience from daily ES operations, covering capacity assessment, index configuration principles, write‑performance tuning, and common troubleshooting methods.

1. Cluster Scale Evaluation

The evaluation focuses on three dimensions:

Compute resources : assess CPU and memory per node. Typical configurations such as 2C8G support ~5k docs/s write throughput, while 32C64G can handle ~50k docs/s.

Storage resources : choose appropriate disk type (SSD for hot nodes, high‑performance cloud disks for warm nodes) and capacity. Mounting multiple disks can increase throughput by ~2.8× compared to a single disk.

Node count : prefer fewer high‑spec nodes (e.g., 3×32C64G) over many low‑spec nodes for better stability and easier scaling.

Two vertical‑scaling methods are commonly used:

Rolling restart : restart nodes one by one, which may affect stability.

Data migration : add high‑spec nodes, migrate data, then remove old nodes (longer and costlier).

2. Index Configuration Assessment

Plan index rollover (daily for small logs, hourly for large volumes) to control index size and improve search performance.

Default primary shard count is 5; adjust based on data volume and query patterns.

Guidelines:

When total shards exceed 100 k, index creation can take minutes and cause write drops. Solutions include pre‑creating indices, using fixed mappings, and employing hot‑warm‑cold tiering with ILM and COS snapshots.

3. Write‑Performance Optimizations

Let ES auto‑generate doc_id to avoid extra existence checks.

Pre‑create indices with fixed mappings to reduce master‑node metadata updates.

Increase refresh_interval (e.g., to 30 s) for non‑real‑time workloads.

During bulk migrations, set replica count to 0 and restore after data transfer.

Use the Bulk API with batch size ≈ 10 MB (≈ 10 000 docs) to reduce network overhead.

Apply custom routing to direct bulk requests to fewer shards:

Prefer SSD disks and mount multiple disks for higher I/O throughput.

Freeze old indices to release memory; frozen indices remain searchable but slower:

Search frozen indices with ignore_throttled=false when needed:

4. Routine Operational Checks

Cluster health (Green/Yellow/Red) can be queried via:

Pending tasks indicate metadata update bottlenecks. Example:

Explain unassigned shards:

List red indices:

Common unassigned‑shard reasons and fixes:

Conclusion

By following the capacity‑assessment guidelines, proper index design, and the performance‑tuning recommendations above, ES clusters can achieve high stability and throughput while reducing operational overhead. The troubleshooting checklist helps quickly locate and resolve common issues, ensuring a reliable search service for Tencent Cloud customers.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Optimization Operations Elasticsearch Cluster Planning

Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.