Understanding Near Real‑Time Search and Core Architecture of ElasticSearch
This article explains how ElasticSearch achieves near real‑time search by using immutable inverted indexes, segment merging, shard distribution, and a translog for durability, while also offering practical guidance on how to study the system effectively.
ElasticSearch provides near real‑time search, where newly inserted documents become searchable shortly after insertion, unlike true real‑time search that is instantaneous.
The challenge of near real‑time search lies in persisting data, building inverted indexes quickly, and handling updates and deletions without sacrificing performance.
ElasticSearch relies on immutable data structures; each document insertion creates a new immutable inverted index segment, similar to functional programming concepts that avoid mutable state.
When new data arrives, ElasticSearch builds a new segment (incremental save) and uses logical deletion markers (del) to handle updates and deletions, with each segment managed as a Lucene Segment and the collection of segments forming an Index .
Data is sharded across nodes, with each shard being a Lucene index; replicas synchronize from the primary shard to ensure distributed storage.
Disk I/O is mitigated by keeping segments in the filesystem cache and periodically flushing them to disk; a background merge thread consolidates small segments into larger ones, removing obsolete data without impacting search performance.
To prevent data loss, ElasticSearch uses a Write‑Ahead Log called translog ; every buffered document is also written to the translog, which is replayed after a crash, while periodic flushes commit segments to disk.
The article concludes with advice on learning ElasticSearch: focus on the design principles in sections 2.1 and 2.2 after gaining basic distributed‑system experience, and explore source code for deeper insight.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.