Backend Development 15 min read

How Elasticsearch Delivers Lightning‑Fast Search with Inverted Indexes and Compression

This article explains how Elasticsearch uses inverted indexes, term dictionaries, and advanced compression techniques such as Frame‑of‑Reference and Roaring Bitmaps to achieve rapid search performance while minimizing memory and disk usage, and it also covers practical indexing tips for production use.

Efficient Ops

Aug 16, 2022

Search

Imagine searching a collection of ancient poems for verses containing the character "前". A traditional relational database would require a sequential scan using a SQL query like

select name from poems where content like "%前%";

, which is slow and inefficient for large datasets.

Search Engine Principles

Search engines follow four main steps: crawling and stop‑word filtering, tokenization to extract keywords, building an inverted index from those keywords, and finally matching user queries against the index.

The core of Elasticsearch’s speed lies in its inverted index implementation, which is provided by the underlying Lucene library.

Inverted Index

After building the inverted index, a query for "前" can instantly locate matching poems. The index consists of three key concepts:

term : the keyword (called a term in ES).

postings list : the list of document IDs that contain a given term.

segment : each shard stores data in multiple segments, each holding up to 2^31 documents with unique integer IDs.

Terms are stored in a term dictionary , which is ordered for fast binary search, similar to a B‑tree index in MySQL. The term index (implemented as a Finite State Transducer, FST) maps term prefixes to dictionary offsets, enabling rapid lookup.

Techniques for Optimizing Postings Lists

Postings lists can become large, so Elasticsearch applies several optimizations:

Compression using Frame of Reference (FOR) : postings are stored as ordered integer arrays and delta‑encoded, allowing each delta to fit in a single byte when values are small.

Further block‑level compression: each block (typically 256 documents) stores a header indicating the bit width needed for its IDs.

Use of Roaring Bitmaps for filter caches, enabling fast bitmap‑based AND/OR operations for frequent filter queries.

FOR compression reduces storage from fixed‑size integers (8‑64 bits) to a variable width of 1‑64 bits, dramatically shrinking disk usage.

Joint Queries

When multiple postings lists must be intersected, Elasticsearch first selects the shortest list and then uses a skip‑list algorithm to efficiently find common document IDs, skipping over non‑matching entries and avoiding unnecessary decompression of blocks.

Summary

Elasticsearch achieves fast search by:

Using an inverted index (term → dictionary → postings list) with FST‑compressed term dictionaries.

Applying FOR compression to postings lists to minimize disk space.

Caching filter results with Roaring Bitmaps for high‑frequency queries.

Leveraging skip‑list based intersection for joint queries when filter caches are unavailable.

Elasticsearch Indexing Guidelines

Explicitly disable indexing for fields that do not need it, as ES indexes everything by default.

For string fields that do not require analysis, set them to not_analyzed.

Avoid highly random IDs (e.g., UUIDs); use monotonically increasing or patterned IDs to improve query performance.

Understanding these mechanisms helps you compare Elasticsearch’s indexing strategy with traditional databases like MySQL, where B‑tree indexes favor write performance, while ES optimizes for read‑heavy search workloads.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

search engine Backend Development Elasticsearch inverted index compression Postings List

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.