Databases 11 min read

Understanding LevelDB: Architecture, Interfaces, and New Features

LevelDB, Google's high-performance key‑value store built on LSM trees, uses an in‑memory skip‑list, immutable memtables, and sstable files organized in multi‑level compaction, offering interfaces for creation, reads, writes, snapshots, and new features like fuzzy search and JSON storage, all explained with diagrams.

Taobao Frontend Technology
Taobao Frontend Technology
Taobao Frontend Technology
Understanding LevelDB: Architecture, Interfaces, and New Features

LevelDB Introduction

LevelDB is a key‑value non‑relational storage system developed by Google, a typical implementation of LSM (Log‑Structured‑Merge Tree). LSM works by first recording read/write operations to an Op log file, then operating on an in‑memory database; when a checkpoint is reached, data is flushed to disk and the Op log is deleted, after which new in‑memory structures are created.

LevelDB uses an in‑memory cache; writes are first stored in memory using a skip‑list structure, and only flushed to disk at checkpoints, ensuring high efficiency.

Overall Architecture

As shown, LevelDB consists of:

Write(k,v) – external interface

Op log – operation log file

memtable – in‑memory data structure

Immutable memtable – data waiting to be flushed

sstable – on‑disk storage structure

manifest – metadata list including configuration and file list

current – list of currently used files

The overall structure is clear, compact, and easy to understand.

External Interfaces

<code>class DB {
    virtual ~DB();
    static Status Open(const Options& options, const std::string& name, DB** dbptr);
    virtual Status Put(const WriteOptions& options, const Slice& key, const Slice& value) = 0;
    virtual Status Delete(const WriteOptions& options, const Slice& key) = 0;
    virtual Status Write(const WriteOptions& options, WriteBatch* updates) = 0;
    virtual Status Get(const ReadOptions& options, const Slice& key, std::string* value) = 0;
    virtual Iterator* NewIterator(const ReadOptions& options) = 0;
    virtual const Snapshot* GetSnapshot() = 0;
    virtual void ReleaseSnapshot(const Snapshot* snapshot) = 0;
};
</code>

Overall interfaces are grouped as:

Database creation and deletion

Database opening

Read, write, and delete operations

Batch write operations

Iteration

Snapshot acquisition and release

Op log Structure Analysis

LevelDB stores Op logs as memory‑mapped files for efficiency. Each Op log is split into 32 KB blocks, and each block contains:

CRC32 checksum for data integrity

Length – size of the Op log data

Log Type – FULL, FIRST, MIDDLE, or LAST, indicating how the log spans blocks

Data – the actual payload

memtable Structure Analysis

memtable is the in‑memory storage structure of LevelDB, implemented as a skip list:

Skip list is a probabilistic data structure that can replace balanced trees. It maintains multiple levels of linked lists; level 0 contains all entries, higher levels contain fewer entries. When inserting a value, a random level k is chosen, and the value is inserted into levels 0 through k‑1.

Each entry stores:

key size and key

value size and value

sequence number (to indicate recency)

type (kTypeValue for valid data, kTypeDeletion for deleted entries)

sstable Structure Analysis

sstable is the on‑disk storage structure, each file up to 2 MB and organized in levels:

level 0 – up to 4 sstables

level 1 – up to 10 MB per sstable

level 2 – up to 100 MB per sstable

level 3+ – each level up to ten times the size of the previous level

When a level exceeds its size limit, a compaction merges files into the next level, which is the origin of the name "LevelDB".

Data Block – stores actual key‑value pairs

Meta Block – stores index/filter information for fast key lookup

Meta Index Block – stores offsets and sizes of Meta Blocks

Index Block – stores offsets and sizes of Data Blocks

Footer – stores offsets and sizes of Meta Index Block and Index Block (secondary index)

Data Block and Meta Block share the same internal format (illustrated below):

Key points:

The first entry in a block (called a restart) is stored uncompressed and recorded at the end of the block; a restart is inserted every k entries.

Each restart serves as an index entry.

Other entries store the length of the shared prefix with the previous key (shared bytes), the length of the unshared suffix (unshared bytes), the key delta, and the value length/value. Compression (Snappy) is optional.

By default, Snappy compression is applied.

Meta Block uses a Bloom filter to quickly determine whether a key may exist in a Data Block. The Bloom filter hashes each key k times and sets the corresponding bits; a lookup checks those bits, and if any are zero the key is definitely absent.

Version Management

LevelDB manages versions with a simple sequence number mechanism:

Each Op log file name contains a unique sequence number; a new Op log increments this number, and the highest number represents the newest file.

Each key‑value pair also carries a sequence number; the larger the number, the newer the version, allowing the latest value to be identified.

New Features

Fuzzy query support – keys can be matched using "*" and "?" patterns.

JSON format data storage – values can be stored as JSON, enabling queries on JSON fields.

Conclusion

LevelDB is compact, efficient, and easy to understand, making it an excellent key‑value storage system.

Note: Images are sourced from the internet.

Image credit: https://unsplash.com/photos/9wwF-VmSOrY By @eberhard grossgasteiger
CompactionLSM TreeDatabase Architecturebloom filterLevelDBSSTableskip listkey-value store
Taobao Frontend Technology
Written by

Taobao Frontend Technology

The frontend landscape is constantly evolving, with rapid innovations across familiar languages. Like us, your understanding of the frontend is continually refreshed. Join us on Taobao, a vibrant, all‑encompassing platform, to uncover limitless potential.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.