Understanding Prometheus Local Storage (TSDB) and Its Architecture
This article explains Prometheus's built‑in time‑series database (TSDB), covering its concepts, storage configuration, block structure, write‑ahead log, mmap reads, inverted indexing, data compression, and remote storage integration for scalable monitoring.
1. Introduction
Prometheus provides a local storage engine; this article focuses on the built‑in TSDB (time‑series database) used by Prometheus.
2. Local Storage (TSDB)
What is a time‑series database? A time‑series database stores data points that are indexed by time, typically used for periodic collection of real‑time monitoring information.
Characteristics
Vertical writes, horizontal reads.
Data points are written sparsely and in massive volumes.
Hotspot data is evident.
Storage Configuration
--storage.tsdb.path : Data directory (default data/ ).
--storage.tsdb.retention.time : Retention period (default 15 days).
--storage.tsdb.wal-compression : Enables WAL compression (default enabled from v2.20.0).
Directory Structure
Each block is an independent directory containing chunk files, a metadata file, and an index file. Example layout:
./data
├── 01BKGV7JBM69T2G1BGBGM6KB12
│ └── meta.json
├── 01BKGTZQ1SYQJTR4PB43C8PD98
│ ├── chunks
│ │ └── 000001
│ ├── tombstones
│ ├── index
│ └── meta.json
├── 01BKGTZQ1HHWHV8FBJXW1Y3W0K
│ └── meta.json
├── 01BKGV7JC0RY8A6MACW02A2PJD
│ ├── chunks
│ │ └── 000001
│ ├── tombstones
│ ├── index
│ └── meta.json
└── wal
├── 00000002
└── checkpoint.000001Storage Principle (Write)
Prometheus writes data into in‑memory blocks for two‑hour intervals, then flushes them to disk as immutable blocks. A write‑ahead log (WAL) ensures data recovery after crashes.
Advantages of this horizontal partitioning include:
Efficient range queries by reading only relevant blocks.
Sequential large‑file writes reduce write amplification.
Recent two‑hour data stays in memory for fast access.
Old data removal is cheap—just delete a directory.
6. mmap (Read)
Prometheus uses mmap to map large compressed files into virtual memory, loading data into physical memory only when accessed, which reduces file‑handle usage and leverages the OS page cache.
7. Indexing
Prometheus employs an inverted index similar to full‑text search: each time series is treated as a small document, with metric names and labels acting as terms.
Example metric requests_total{path="/status", method="GET", instance="10.0.0.1:80"} yields the terms:
name="requests_total"
path="/status"
method="GET"
instance="10.0.0.1:80"
Tip : Avoid using dynamic values as labels because they enlarge the index files.
8. Data Compression
Compression involves block merging, expiration deletion, and chunk reconstruction. Merged blocks reduce the number of files, and tombstone files record deletions without immediate removal, allowing efficient space reclamation.
3. Remote Storage
Local storage limits scalability; Prometheus provides remote storage integration via two HTTP‑based, protobuf‑encoded protocols for writing samples to and reading samples from remote endpoints. The API is still experimental and may evolve to gRPC over HTTP/2.
References
https://fabxc.org/tsdb/
https://prometheus.io/docs/prometheus/latest/storage/
https://docs.google.com/presentation/d/1TMvzwdaS8Vw9MtscI9ehDyiMngII8iB_Z5D4QW4U4ho
Inverted index: https://nlp.stanford.edu/IR-book/html/htmledition/a-first-take-at-building-an-inverted-index-1.html
Aikesheng Open Source Community
The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.