Backend Development 8 min read

Understanding Near Real‑Time Search and Core Architecture of ElasticSearch

This article explains how ElasticSearch achieves near real‑time search by using immutable inverted indexes, segment merging, shard distribution, and a translog for durability, while also offering practical guidance on how to study the system effectively.

Top Architect

Nov 7, 2021

Understanding Near Real‑Time Search and Core Architecture of ElasticSearch

ElasticSearch provides near real‑time search, where newly inserted documents become searchable shortly after insertion, unlike true real‑time search that is instantaneous.

The challenge of near real‑time search lies in persisting data, building inverted indexes quickly, and handling updates and deletions without sacrificing performance.

ElasticSearch relies on immutable data structures; each document insertion creates a new immutable inverted index segment, similar to functional programming concepts that avoid mutable state.

When new data arrives, ElasticSearch builds a new segment (incremental save) and uses logical deletion markers (del) to handle updates and deletions, with each segment managed as a Lucene Segment and the collection of segments forming an Index.

Data is sharded across nodes, with each shard being a Lucene index; replicas synchronize from the primary shard to ensure distributed storage.

Disk I/O is mitigated by keeping segments in the filesystem cache and periodically flushing them to disk; a background merge thread consolidates small segments into larger ones, removing obsolete data without impacting search performance.

To prevent data loss, ElasticSearch uses a Write‑Ahead Log called translog; every buffered document is also written to the translog, which is replayed after a crash, while periodic flushes commit segments to disk.

The article concludes with advice on learning ElasticSearch: focus on the design principles in sections 2.1 and 2.2 after gaining basic distributed‑system experience, and explore source code for deeper insight.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

inverted index Distributed Search Segment translog near real-time

Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.