Databases 7 min read

Understanding LSM-Tree Architecture and Its Applications in Big Data Systems

The article explains the Log-Structured Merge-Tree (LSM) architecture, its core components, advantages and disadvantages, and demonstrates how it is employed in big‑data platforms such as HBase and Apache Druid to achieve high‑throughput writes and scalable query processing.

Top Architecture Tech Stack
Top Architecture Tech Stack
Top Architecture Tech Stack
Understanding LSM-Tree Architecture and Its Applications in Big Data Systems

LSM Architecture

Overview

LSM ( Log-Structured Merge-Tree ) is a write‑optimized data structure used in key‑value stores and data systems; all write operations (insert, update, delete) are first recorded in memory and later flushed to disk in batches, reducing I/O and improving write performance.

Components

The main components of an LSM‑Tree are MemTable (in‑memory table), immutable SSTable (disk table), the merge process, compaction, Bloom filter, and the Write‑Ahead Log ( WAL ) which ensures durability.

MemTable: temporarily stores newly written data in memory.

SSTable: when the MemTable is full, data are flushed as immutable disk tables.

Merge: periodically combines multiple SSTables to keep their number and size manageable.

Compaction: compresses data during merge to optimize storage space.

Bloom Filter: quickly checks whether a key may exist in an SSTable, reducing unnecessary disk reads.

WAL: logs writes before they reach the MemTable, enabling recovery after crashes.

Advantages and Disadvantages

Advantages include high write throughput, batch writes, and delayed merge/compaction, making LSM‑Tree ideal for write‑intensive workloads. Disadvantages are slower reads, space amplification, and write amplification.

LSM in the Big Data Domain

HBase Application of LSM‑Tree

In HBase, data are first written to the HLog (commit log), then to the MemStore (equivalent to a MemTable), and finally flushed to disk as a storeFile (equivalent to an SSTable). This LSM‑Tree‑based flow enables fast ingestion, though reads may need to query both the MemStore and multiple SSTables.

To mitigate read latency, HBase performs periodic compaction of small SSTables into larger ones and employs Bloom filters for faster key existence checks.

Apache Druid Application of LSM‑Tree

Druid adopts an LSM‑like architecture for its real‑time nodes. Incoming data are first loaded into an in‑memory buffer (similar to a MemTable). When conditions are met, the buffer is flushed to disk as a segment, which is immediately loaded into off‑heap memory for fast querying.

Real‑time nodes periodically merge segments from the same time window into larger segments (segment merge), analogous to LSM‑Tree compaction, and upload the merged segments to DeepStorage.

The coordinator node assigns historical nodes to download and load these segments, after which query nodes retrieve data from both real‑time and historical nodes and merge the results for the user.

Big DataCompactionLSM TreedatabasesStorage
Top Architecture Tech Stack
Written by

Top Architecture Tech Stack

Sharing Java and Python tech insights, with occasional practical development tool tips.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.