Understanding LevelDB: Architecture, Interfaces, and New Features
LevelDB, Google's high-performance key‑value store built on LSM trees, uses an in‑memory skip‑list, immutable memtables, and sstable files organized in multi‑level compaction, offering interfaces for creation, reads, writes, snapshots, and new features like fuzzy search and JSON storage, all explained with diagrams.
LevelDB Introduction
LevelDB is a key‑value non‑relational storage system developed by Google, a typical implementation of LSM (Log‑Structured‑Merge Tree). LSM works by first recording read/write operations to an Op log file, then operating on an in‑memory database; when a checkpoint is reached, data is flushed to disk and the Op log is deleted, after which new in‑memory structures are created.
LevelDB uses an in‑memory cache; writes are first stored in memory using a skip‑list structure, and only flushed to disk at checkpoints, ensuring high efficiency.
Overall Architecture
As shown, LevelDB consists of:
Write(k,v) – external interface
Op log – operation log file
memtable – in‑memory data structure
Immutable memtable – data waiting to be flushed
sstable – on‑disk storage structure
manifest – metadata list including configuration and file list
current – list of currently used files
The overall structure is clear, compact, and easy to understand.
External Interfaces
<code>class DB {
virtual ~DB();
static Status Open(const Options& options, const std::string& name, DB** dbptr);
virtual Status Put(const WriteOptions& options, const Slice& key, const Slice& value) = 0;
virtual Status Delete(const WriteOptions& options, const Slice& key) = 0;
virtual Status Write(const WriteOptions& options, WriteBatch* updates) = 0;
virtual Status Get(const ReadOptions& options, const Slice& key, std::string* value) = 0;
virtual Iterator* NewIterator(const ReadOptions& options) = 0;
virtual const Snapshot* GetSnapshot() = 0;
virtual void ReleaseSnapshot(const Snapshot* snapshot) = 0;
};
</code>Overall interfaces are grouped as:
Database creation and deletion
Database opening
Read, write, and delete operations
Batch write operations
Iteration
Snapshot acquisition and release
Op log Structure Analysis
LevelDB stores Op logs as memory‑mapped files for efficiency. Each Op log is split into 32 KB blocks, and each block contains:
CRC32 checksum for data integrity
Length – size of the Op log data
Log Type – FULL, FIRST, MIDDLE, or LAST, indicating how the log spans blocks
Data – the actual payload
memtable Structure Analysis
memtable is the in‑memory storage structure of LevelDB, implemented as a skip list:
Skip list is a probabilistic data structure that can replace balanced trees. It maintains multiple levels of linked lists; level 0 contains all entries, higher levels contain fewer entries. When inserting a value, a random level k is chosen, and the value is inserted into levels 0 through k‑1.
Each entry stores:
key size and key
value size and value
sequence number (to indicate recency)
type (kTypeValue for valid data, kTypeDeletion for deleted entries)
sstable Structure Analysis
sstable is the on‑disk storage structure, each file up to 2 MB and organized in levels:
level 0 – up to 4 sstables
level 1 – up to 10 MB per sstable
level 2 – up to 100 MB per sstable
level 3+ – each level up to ten times the size of the previous level
When a level exceeds its size limit, a compaction merges files into the next level, which is the origin of the name "LevelDB".
Data Block – stores actual key‑value pairs
Meta Block – stores index/filter information for fast key lookup
Meta Index Block – stores offsets and sizes of Meta Blocks
Index Block – stores offsets and sizes of Data Blocks
Footer – stores offsets and sizes of Meta Index Block and Index Block (secondary index)
Data Block and Meta Block share the same internal format (illustrated below):
Key points:
The first entry in a block (called a restart) is stored uncompressed and recorded at the end of the block; a restart is inserted every k entries.
Each restart serves as an index entry.
Other entries store the length of the shared prefix with the previous key (shared bytes), the length of the unshared suffix (unshared bytes), the key delta, and the value length/value. Compression (Snappy) is optional.
By default, Snappy compression is applied.
Meta Block uses a Bloom filter to quickly determine whether a key may exist in a Data Block. The Bloom filter hashes each key k times and sets the corresponding bits; a lookup checks those bits, and if any are zero the key is definitely absent.
Version Management
LevelDB manages versions with a simple sequence number mechanism:
Each Op log file name contains a unique sequence number; a new Op log increments this number, and the highest number represents the newest file.
Each key‑value pair also carries a sequence number; the larger the number, the newer the version, allowing the latest value to be identified.
New Features
Fuzzy query support – keys can be matched using "*" and "?" patterns.
JSON format data storage – values can be stored as JSON, enabling queries on JSON fields.
Conclusion
LevelDB is compact, efficient, and easy to understand, making it an excellent key‑value storage system.
Note: Images are sourced from the internet.
Image credit: https://unsplash.com/photos/9wwF-VmSOrY By @eberhard grossgasteiger
Taobao Frontend Technology
The frontend landscape is constantly evolving, with rapid innovations across familiar languages. Like us, your understanding of the frontend is continually refreshed. Join us on Taobao, a vibrant, all‑encompassing platform, to uncover limitless potential.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.