Databases 15 min read

Understanding Elasticsearch Cluster and Data Layer Architecture

This article explains Elasticsearch’s cluster and data layer architecture, detailing nodes, indices, shards, replicas, deployment models, and the trade‑offs of different distributed system designs, helping readers understand how Elasticsearch achieves scalability, reliability, and query performance.

Art of Distributed System Architecture Design
Art of Distributed System Architecture Design
Art of Distributed System Architecture Design
Understanding Elasticsearch Cluster and Data Layer Architecture

Elasticsearch Cluster Architecture

Elasticsearch is a widely used open‑source search and analytics system, popular in three main areas: search (as a newer alternative to Solr), JSON document storage (offering better read/write performance than MongoDB and richer geo and mixed queries), and time‑series data analysis for logs and monitoring.

Key concepts include:

Node: a running Elasticsearch instance, typically a process on a machine.

Index: a logical construct containing mapping and inverted/forward index files, which may be distributed across one or many machines.

Shard: a partition of an index to support large data volumes; each shard is managed by a node. Shards can be primary or replica.

Replica: a copy of a shard that ensures strong or eventual consistency.

The diagram below illustrates a typical deployment.

Example indices:

Index 1: three primary shards (P1‑P3) on three different nodes, no replicas.

Index 2: two primary shards (P1‑P2) each with a replica (R1‑R2) on separate nodes; primary and replica of the same shard never share a node, providing fault tolerance.

Index Process

When indexing a document, it is routed to the primary shard, indexed there, then replicated to its replica shards before the operation is acknowledged.

If a primary or replica shard fails, the missing shard is recovered by copying data from an existing replica, during which the system operates in a degraded state until failover completes.

Replicas also improve read scalability by distributing query load.

Role Deployment Models

Elasticsearch supports two deployment styles:

Mixed deployment (default): Data and Transport roles coexist on the same node. Requests are routed randomly, and each node handles both data storage and request forwarding. This model is simple to start but can cause resource contention and limits cluster size due to full mesh connections.

Tiered deployment: Separate Transport nodes (for request routing and result merging) from Data nodes (for storage and computation). This isolates roles, reduces inter‑node connections, allows larger clusters, supports hot upgrades, and improves fault isolation.

The choice depends on workload and scale requirements.

Elasticsearch Data Layer Architecture

Indexes and metadata are stored on the local file system, supporting various loading methods (niofs, mmap, simplefs, smb). The default is mmap, which maps index files directly into memory for best performance.

Because data resides locally, node failures can cause data loss; replicas mitigate this risk.

Replica configuration determines how many copies of each shard exist. For example, a replica count of 2 yields one primary shard and two replica shards, distributed across different machines or racks.

Replica benefits:

Ensures service availability when a replica is down.

Protects data reliability; without replicas, a primary node failure would lose all shard data.

Boosts query capacity; adding N replicas can increase query throughput roughly N‑fold.

However, replicas also introduce drawbacks:

Additional resource cost; unused replica capacity wastes compute.

Write performance penalty, as each write must be applied to the primary and then to all replicas.

Slow dynamic scaling; adding replicas requires full data copy, which can be time‑consuming.

Two broader distributed system architectures are discussed:

1. Local File‑System Based Distributed System

Each shard (primary + replica) resides on local disks. If a node fails, the replica is promoted to primary and a new replica is created on another node, requiring full data copy (e.g., 200 GB over a 1 Gbps link takes ~1600 s). Redundancy increases resource consumption.

Variants include dual‑cluster backups.

2. Distributed File‑System (Shared Storage) Based Distributed System

This model separates storage and compute: shards contain only computation logic and reference files on a shared distributed file system (e.g., HDFS). Nodes handle computation while storage is centralized, enabling elastic scaling of each layer independently.

Advantages:

Fine‑grained resource scaling for storage and compute.

Reduced waste and lower overall cost.

Better hotspot handling; compute can be re‑balanced without moving data.

Drawback:

Potential performance gap when accessing remote distributed storage versus local disks, though modern user‑space protocols narrow this difference.

Systems like HBase and Solr adopt this architecture.

Summary

Both architectures have distinct strengths and weaknesses; selecting the appropriate design depends on specific workload characteristics and scalability goals.

distributed systemsSearch EngineElasticsearchShardingdata-architectureReplica
Art of Distributed System Architecture Design
Written by

Art of Distributed System Architecture Design

Introductions to large-scale distributed system architectures; insights and knowledge sharing on large-scale internet system architecture; front-end web architecture overviews; practical tips and experiences with PHP, JavaScript, Erlang, C/C++ and other languages in large-scale internet system development.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.