Big Data 7 min read

Master ElasticSearch: Core Concepts, Architecture, and Search Workflow Explained

This article provides a comprehensive overview of ElasticSearch, covering its definition, core components such as indexes, shards and replicas, the analysis pipeline, inverted index mechanics, and the two‑stage search process that enables scalable, fault‑tolerant full‑text search in big‑data environments.

Mike Chen's Internet Architecture

Jul 1, 2025

Master ElasticSearch: Core Concepts, Architecture, and Search Workflow Explained

ElasticSearch Overview

ElasticSearch is a distributed full‑text search engine built on Apache Lucene, widely used in big‑data scenarios for fast, scalable search and analytics.

Core Components

Index : a collection of documents with similar characteristics, containing mapping and inverted‑index files; data may reside on one or many nodes.

Type : logical grouping of similar documents, analogous to a table in relational databases.

Document : the basic searchable unit, represented in JSON, similar to a row.

Field : the smallest unit within a document, comparable to a column.

Shard : a primary partition of an index that enables horizontal scaling; each shard is a physical Lucene index.

There are two shard types: Primary Shard and Replica Shard . Replicas provide redundancy and enable load‑balancing for queries.

ElasticSearch Workflow

Search consists of two stages:

1. Query Phase

Client sends a request to a coordinating node, which broadcasts it to relevant primary or replica shards.

Each shard executes the query locally and builds a priority queue of matching documents.

The coordinating node merges, sorts, and paginates the results from all shards.

2. Fetch Phase

The coordinating node retrieves the full document source for the document IDs returned in the query phase and returns them to the client.

Text Analysis and Inverted Index

ElasticSearch uses analyzers composed of character filters, tokenizers, and token filters to turn raw text into terms stored in an inverted index.

Example character filter removes HTML tags:

<div>
<span>mikechen的互联网架构<span>
</div>

Built‑in tokenizers include Standard, Simple, Stop, Whitespace, Keyword, Pattern, and language‑specific analyzers.

Token filters further process tokens (e.g., lower‑casing, stop‑word removal).

Inverted Index

The inverted index maps terms to the list of document IDs containing those terms, enabling rapid full‑text search across massive data sets.

Search Process Summary

Search is executed in the two‑phase query and fetch stages, leveraging distributed shards and replicas to achieve high throughput, low latency, and fault tolerance.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Sharding inverted index Analyzers Distributed Search

Written by

Mike Chen's Internet Architecture

Over ten years of BAT architecture experience, shared generously!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.