Elasticsearch Cluster Planning and Optimization for Large-Scale Time-Series Data
This article explains how to design and tune an Elasticsearch cluster for massive time‑series workloads, covering node role separation, storage strategies, routing, index splitting, hot‑warm architecture, cross‑cluster search and other performance‑boosting techniques.
The project described processes 10‑12 billion daily events (≈400 billion records per year, ~200 TB) using Elasticsearch 5.3.3, requiring careful storage and real‑time query design.
Cluster Planning : Separate Master, Data, and Coordinating‑Only nodes. Master nodes manage cluster state and should not handle I/O; configure them with node.master: true node.data: false node.ingest: false . Data nodes store and query data; configure with node.master: false node.data: true node.ingest: false . Coordinating‑Only nodes act as request routers and should have node.master: false node.data: false node.ingest: false search.remote.connect: false .
Routing : Use a time‑based routing key to limit shard scans. The default shard calculation is shard_num = hash(_routing) % num_primary_shards . Specifying a routing value (e.g., routing=user1 ) reduces the number of shards queried, dramatically improving latency. For uneven data distribution, use index.routing_partition_size with the formula shard_num = (hash(_routing) + hash(_id) % routing_partition_size) % num_primary_shards and set the index mapping "_routing": {"required": true} .
Index Splitting : Large indices are split horizontally into shards, but each shard has a limit (~2 billion docs). Use the Rollover API with an index template to keep shard sizes manageable (e.g., 5 primary shards per index). Example template: PUT _template/template_1 { "index_patterns": ["log-*"], "order": 0, "settings": {"number_of_shards": 5}, "aliases": {"alias1": {}} } Rollover example: PUT /logs-000001 { "aliases": {"logs_write": {}} } POST /logs_write/_rollover { "conditions": {"max_age": "7d", "max_docs": 1000} }
Hot‑Warm Architecture : Tag Data nodes as hot or warm (e.g., start hot nodes with ./bin/elasticsearch -Enode.attr.box_type=hot or set node.attr.box_type: hot ). Hot nodes have fast CPUs/SSD for recent writes; warm nodes use cheaper hardware for historical reads. Apply allocation filters in templates: PUT _template/active-logs { "template": "active-logs-*", "settings": { "number_of_shards": 5, "number_of_replicas": 1, "routing.allocation.include.box_type": "hot", "routing.allocation.total_shards_per_node": 2 }, "aliases": {"active-logs": {}} } PUT _template/inactive-logs { "template": "inactive-logs-*", "settings": { "number_of_shards": 1, "number_of_replicas": 0, "routing.allocation.include.box_type": "warm", "codec": "best_compression" } } After rollover, move the old index to warm nodes: PUT active-logs-1/_settings { "index.blocks.write": true, "index.routing.allocation.include.box_type": "warm" } Optionally shrink and force‑merge the warm index to reduce shard count and segment overhead.
Other Optimizations : Use shard allocation filtering to isolate heavy‑traffic indices, label nodes with custom attributes (e.g., node.attr.zone=zone_a ) and route specific indices to them. For multi‑data‑center deployments, consider Tribe nodes or Cross‑Cluster Search. Cross‑Cluster Search is configured via PUT _cluster/settings with remote clusters and can be queried like POST /cluster_one:decision/_search {"match_all":{}} .
Conclusion : Combining node role separation, routing, index splitting, hot‑warm tiers, and cross‑cluster capabilities yields a scalable, high‑performance Elasticsearch solution for massive time‑series data.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.