Comprehensive Guide to Elasticsearch Index Design, Settings, and Mapping
This article provides a detailed guide on Elasticsearch index design, covering index settings, shard and replica planning, mapping strategies, complex types, lifecycle management, template usage, and practical best‑practice recommendations for large‑scale log data clusters.
1. What Is ES Index Design
Elasticsearch (ES) is a distributed, RESTful search and analytics engine that differs from traditional relational databases by offering flexible data modeling and multi‑dimensional search capabilities. Proper index design involves pre‑defining settings and mappings to ensure stability and optimal resource utilization.
What is ES index design?
Why plan ahead and what problems arise from poor design?
Key points to consider when designing an ES index?
How to set shard and replica numbers?
How to handle unpredictable index size?
How to configure mappings correctly?
1.1 ES Index Design Overview
A complete ES index design typically addresses storage evaluation, shard count, read/write patterns, replica count, field types, and performance‑related parameters.
2. Index Settings Design
Settings focus on storage evaluation, shard sizing, recovery behavior, and performance tuning.
2.1 Storage Evaluation
Before creating an index, assess business type (write‑heavy vs. read‑heavy), storage period, and shard configuration.
Business Type – Write‑Heavy (e.g., logs)
log_appcode-2021-10-01-000001
log_appcode-2021-10-01-000002
log_appcode-2021-10-01-000003Business Type – Read‑Heavy
For query‑intensive indices, increase shard count to improve parallelism.
2.1.1 Rollover Operation Practice
a. Create an index with an alias:
PUT log_xxx-20211020-000001
{
"aliases": {
"log_xxx-alias": {
"is_write_index": true
}
}
}b. Bulk insert data via the alias:
PUT log_xxx-alias/_bulk
{ "index": { "_id": 1 } }
{ "name": "zhangsan" }
{ "index": { "_id": 2 } }
{ "name": "lisi" }
...c. Trigger rollover when conditions are met:
POST log_xxx-alias/_rollover
{
"conditions": {
"max_age": "7d",
"max_docs": 5,
"max_size": "5gb"
}
}2.1.2 Shard and Replica Settings
Typical recommendations: single shard size 20‑40 GB, avoid exceeding 50 GB; shard count should approximate node count; replica count ≥ 1 for high availability.
PUT /my_index/_settings
{
"settings": {
"index.number_of_shards": "10",
"index.number_of_replicas": "1"
}
}2.2 Index Recovery Settings
Adjust index.unassigned.node_left.delayed_timeout to postpone shard allocation after node loss, e.g., 5 minutes:
PUT /my_index/_settings
{
"settings": {
"index.unassigned.node_left.delayed_timeout": "5m"
}
}2.3 Performance Optimization Settings
Key parameters: index.routing.allocation.total_shards_per_node (limits shards per node) and index.refresh_interval (controls refresh frequency). Example for a log cluster: total_shards_per_node: 1 , refresh_interval: 120s .
3. Index Mapping Design
Mappings define how documents and fields are stored and indexed. They can be static or dynamic.
3.1 Mapping Overview
Static mapping offers precise control; dynamic mapping automatically creates field types on ingest.
3.2 Mapping Design Process
Count business fields.
Determine field types and whether they need to be searchable.
Identify fields required for sorting or aggregation (enable fielddata for text fields).
Decide if a field should be stored separately ( store ).
3.3 Key Mapping Parameters
Important attributes include type , index , norms , and ignore_above . See the official ES mapping reference for a full list.
3.4 Mapping Caveats
Fields cannot be deleted.
Field types cannot be changed; reindexing is required for modifications.
4. Complex Types Design
4.1 Wide‑Table Pattern
Store many related attributes in a single index to avoid joins, useful for OLAP‑style analytics.
4.2 Nested Type
Allows independent queries on objects within an array but incurs higher update cost.
4.3 Parent‑Child (Join) Type
Replaces nested for scenarios with frequent child updates; uses a join field.
5. Practical Index Design (Log Cluster Case Study)
Describes lifecycle management (apply‑maintain‑clean), weekly shard evaluation, template usage, and automation via Airflow.
5.1 Index Lifecycle Management
Weekly index creation, approval‑driven storage‑period extensions, and scheduled cleanup scripts with throttling.
5.2 Settings Design in Production
Weekly calculation of shard count based on current storage size (target 40 GB per shard) and node count, followed by adjustments to total_shards_per_node and refresh_interval .
5.3 Template Management
Default templates provide common settings (e.g., 20 shards, 60 s refresh) and shared field definitions; custom templates override defaults for specific index patterns.
{
"order": 1,
"index_patterns": ["log_*"],
"settings": {
"index": {
"number_of_shards": "20",
"refresh_interval": "60s",
"number_of_replicas": "1",
"routing": {"allocation": {"total_shards_per_node": "1"}}
}
},
"mappings": {
"properties": {
"@timestamp": {"type": "date"},
"send_time": {"type": "long", "index": false},
"app_code": {"type": "keyword", "index": false},
"content": {"type": "text", "norms": false}
}
}
}6. Best‑Practice Recommendations
Shard size: 20‑40 GB (max 2.1 billion docs).
Shard count: align with node count for balanced load.
Replica count: at least 1 for production; 0 only for reindexing.
total_shards_per_node : 1 when shards ≤ nodes, otherwise scale proportionally.
refresh_interval : 1 s for near‑real‑time, 30‑60 s for log‑heavy workloads.
Use templates to centralize common settings and mappings.
Prefer static mappings when the data model is known; supplement with dynamic templates for unknown fields.
7. Summary
Effective ES index design requires early planning of shard and replica strategy, careful mapping definition (static preferred, dynamic as fallback), and automated lifecycle management to maintain performance and stability at scale.
8. References
https://www.elastic.co/guide/cn/elasticsearch/guide/current/index-management.html
https://www.elastic.co/guide/cn/elasticsearch/guide/current/mapping-intro.html
https://www.elastic.co/guide/en/elasticsearch/reference/7.7/mapping.html
https://www.elastic.co/guide/en/elasticsearch/reference/7.7/indices-rollover-index.html
https://blog.csdn.net/laoyang360/article/details/83717399
Qunar Tech Salon
Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.