Big Data 7 min read

Master ElasticSearch: Core Concepts, Architecture, and Search Workflow Explained

This article provides a comprehensive overview of ElasticSearch, covering its definition, core components such as indexes, shards and replicas, the analysis pipeline, inverted index mechanics, and the two‑stage search process that enables scalable, fault‑tolerant full‑text search in big‑data environments.

Mike Chen's Internet Architecture
Mike Chen's Internet Architecture
Mike Chen's Internet Architecture
Master ElasticSearch: Core Concepts, Architecture, and Search Workflow Explained

ElasticSearch Overview

ElasticSearch is a distributed full‑text search engine built on Apache Lucene, widely used in big‑data scenarios for fast, scalable search and analytics.

ElasticSearch architecture diagram
ElasticSearch architecture diagram

Core Components

Index : a collection of documents with similar characteristics, containing mapping and inverted‑index files; data may reside on one or many nodes.

Type : logical grouping of similar documents, analogous to a table in relational databases.

Document : the basic searchable unit, represented in JSON, similar to a row.

Field : the smallest unit within a document, comparable to a column.

Shard : a primary partition of an index that enables horizontal scaling; each shard is a physical Lucene index.

There are two shard types: Primary Shard and Replica Shard . Replicas provide redundancy and enable load‑balancing for queries.

Shard diagram
Shard diagram

ElasticSearch Workflow

Search workflow diagram
Search workflow diagram

Search consists of two stages:

1. Query Phase

Client sends a request to a coordinating node, which broadcasts it to relevant primary or replica shards.

Each shard executes the query locally and builds a priority queue of matching documents.

The coordinating node merges, sorts, and paginates the results from all shards.

2. Fetch Phase

The coordinating node retrieves the full document source for the document IDs returned in the query phase and returns them to the client.

Text Analysis and Inverted Index

ElasticSearch uses analyzers composed of character filters, tokenizers, and token filters to turn raw text into terms stored in an inverted index.

Example character filter removes HTML tags:

<div>
<span>mikechen的互联网架构<span>
</div>

Built‑in tokenizers include Standard, Simple, Stop, Whitespace, Keyword, Pattern, and language‑specific analyzers.

Token filters further process tokens (e.g., lower‑casing, stop‑word removal).

Inverted Index

The inverted index maps terms to the list of document IDs containing those terms, enabling rapid full‑text search across massive data sets.

Inverted index illustration
Inverted index illustration

Search Process Summary

Search is executed in the two‑phase query and fetch stages, leveraging distributed shards and replicas to achieve high throughput, low latency, and fault tolerance.

big dataElasticsearchshardingInverted IndexAnalyzersdistributed search
Mike Chen's Internet Architecture
Written by

Mike Chen's Internet Architecture

Over ten years of BAT architecture experience, shared generously!

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.