Databases 27 min read

Comprehensive Guide to Elasticsearch Index Design, Settings, and Mapping

This article provides a detailed guide on Elasticsearch index design, covering index settings, shard and replica planning, mapping strategies, complex types, lifecycle management, template usage, and practical best‑practice recommendations for large‑scale log data clusters.

Qunar Tech Salon

Nov 24, 2021

Comprehensive Guide to Elasticsearch Index Design, Settings, and Mapping

1. What Is ES Index Design

Elasticsearch (ES) is a distributed, RESTful search and analytics engine that differs from traditional relational databases by offering flexible data modeling and multi‑dimensional search capabilities. Proper index design involves pre‑defining settings and mappings to ensure stability and optimal resource utilization.

What is ES index design?

Why plan ahead and what problems arise from poor design?

Key points to consider when designing an ES index?

How to set shard and replica numbers?

How to handle unpredictable index size?

How to configure mappings correctly?

1.1 ES Index Design Overview

A complete ES index design typically addresses storage evaluation, shard count, read/write patterns, replica count, field types, and performance‑related parameters.

2. Index Settings Design

Settings focus on storage evaluation, shard sizing, recovery behavior, and performance tuning.

2.1 Storage Evaluation

Before creating an index, assess business type (write‑heavy vs. read‑heavy), storage period, and shard configuration.

Business Type – Write‑Heavy (e.g., logs)

log_appcode-2021-10-01-000001</code><code>log_appcode-2021-10-01-000002</code><code>log_appcode-2021-10-01-000003

Business Type – Read‑Heavy

For query‑intensive indices, increase shard count to improve parallelism.

2.1.1 Rollover Operation Practice

a. Create an index with an alias:

PUT log_xxx-20211020-000001
{
  "aliases": {
    "log_xxx-alias": {
      "is_write_index": true
    }
  }
}

b. Bulk insert data via the alias:

PUT log_xxx-alias/_bulk
{ "index": { "_id": 1 } }
{ "name": "zhangsan" }
{ "index": { "_id": 2 } }
{ "name": "lisi" }
...

c. Trigger rollover when conditions are met:

POST log_xxx-alias/_rollover
{
  "conditions": {
    "max_age": "7d",
    "max_docs": 5,
    "max_size": "5gb"
  }
}

2.1.2 Shard and Replica Settings

Typical recommendations: single shard size 20‑40 GB, avoid exceeding 50 GB; shard count should approximate node count; replica count ≥ 1 for high availability.

PUT /my_index/_settings
{
  "settings": {
    "index.number_of_shards": "10",
    "index.number_of_replicas": "1"
  }
}

2.2 Index Recovery Settings

Adjust index.unassigned.node_left.delayed_timeout to postpone shard allocation after node loss, e.g., 5 minutes:

PUT /my_index/_settings
{
  "settings": {
    "index.unassigned.node_left.delayed_timeout": "5m"
  }
}

2.3 Performance Optimization Settings

Key parameters: index.routing.allocation.total_shards_per_node (limits shards per node) and index.refresh_interval (controls refresh frequency). Example for a log cluster: total_shards_per_node: 1, refresh_interval: 120s.

3. Index Mapping Design

Mappings define how documents and fields are stored and indexed. They can be static or dynamic.

3.1 Mapping Overview

Static mapping offers precise control; dynamic mapping automatically creates field types on ingest.

3.2 Mapping Design Process

Count business fields.

Determine field types and whether they need to be searchable.

Identify fields required for sorting or aggregation (enable fielddata for text fields).

Decide if a field should be stored separately ( store).

3.3 Key Mapping Parameters

Important attributes include type, index, norms, and ignore_above. See the official ES mapping reference for a full list.

3.4 Mapping Caveats

Fields cannot be deleted.

Field types cannot be changed; reindexing is required for modifications.

4. Complex Types Design

4.1 Wide‑Table Pattern

Store many related attributes in a single index to avoid joins, useful for OLAP‑style analytics.

4.2 Nested Type

Allows independent queries on objects within an array but incurs higher update cost.

4.3 Parent‑Child (Join) Type

Replaces nested for scenarios with frequent child updates; uses a join field.

5. Practical Index Design (Log Cluster Case Study)

Describes lifecycle management (apply‑maintain‑clean), weekly shard evaluation, template usage, and automation via Airflow.

5.1 Index Lifecycle Management

Weekly index creation, approval‑driven storage‑period extensions, and scheduled cleanup scripts with throttling.

5.2 Settings Design in Production

Weekly calculation of shard count based on current storage size (target 40 GB per shard) and node count, followed by adjustments to total_shards_per_node and refresh_interval.

5.3 Template Management

Default templates provide common settings (e.g., 20 shards, 60 s refresh) and shared field definitions; custom templates override defaults for specific index patterns.

{
  "order": 1,
  "index_patterns": ["log_*"],
  "settings": {
    "index": {
      "number_of_shards": "20",
      "refresh_interval": "60s",
      "number_of_replicas": "1",
      "routing": {"allocation": {"total_shards_per_node": "1"}}
    }
  },
  "mappings": {
    "properties": {
      "@timestamp": {"type": "date"},
      "send_time": {"type": "long", "index": false},
      "app_code": {"type": "keyword", "index": false},
      "content": {"type": "text", "norms": false}
    }
  }
}

6. Best‑Practice Recommendations

Shard size: 20‑40 GB (max 2.1 billion docs).

Shard count: align with node count for balanced load.

Replica count: at least 1 for production; 0 only for reindexing. total_shards_per_node: 1 when shards ≤ nodes, otherwise scale proportionally. refresh_interval: 1 s for near‑real‑time, 30‑60 s for log‑heavy workloads.

Use templates to centralize common settings and mappings.

Prefer static mappings when the data model is known; supplement with dynamic templates for unknown fields.

7. Summary

Effective ES index design requires early planning of shard and replica strategy, careful mapping definition (static preferred, dynamic as fallback), and automated lifecycle management to maintain performance and stability at scale.

8. References

https://www.elastic.co/guide/cn/elasticsearch/guide/current/index-management.html

https://www.elastic.co/guide/cn/elasticsearch/guide/current/mapping-intro.html

https://www.elastic.co/guide/en/elasticsearch/reference/7.7/mapping.html

https://www.elastic.co/guide/en/elasticsearch/reference/7.7/indices-rollover-index.html

https://blog.csdn.net/laoyang360/article/details/83717399

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data search engine Elasticsearch index design Mapping settings

Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.