Unlocking Prometheus: How TSDB Powers Scalable Monitoring and Fast Queries
This article demystifies Prometheus by explaining its core concepts, daily monitoring queries, the role of its TSDB storage engine, how series, label, and time indexes enable fast time‑series queries, and how pre‑computed recording rules boost performance for dashboards and alerts.
Background
For many people the unknown feels intimidating; the author felt the same when first encountering Prometheus, which can seem overwhelming for beginners because it contains many concepts.
Concepts: Instance, Job, Metric, Metric Name, Metric Label, Metric Value, Metric Type (Counter, Gauge, Histogram, Summary), DataType (Instant Vector, Range Vector, Scalar, String), Operator, Function
As Jack Ma said, Alibaba is a data company rather than a retail company; similarly, Prometheus is fundamentally a data‑driven monitoring system.
Daily Monitoring
Assume we need to monitor the request volume of each API on WebServerA. The dimensions include job (service name), instance (IP), handler (API name), method, code (status), and value (request count).
Example SQL‑like queries:
<code>SELECT * FROM http_requests_total WHERE code="200" AND method="put" AND created_at BETWEEN 1495435700 AND 1495435710;</code> <code>SELECT * FROM http_requests_total WHERE handler="prometheus" AND method="post" AND created_at BETWEEN 1495435700 AND 1495435710;</code> <code>SELECT * FROM http_requests_total WHERE handler="query%" AND instance="10.59.8.110" AND created_at BETWEEN 1495435700 AND 1495435710;</code>From these examples we see that daily monitoring mainly involves dimension‑based queries combined with time ranges. If we monitor 100 services, each with 10 instances, 20 APIs, 4 methods, collecting data every 30 seconds and retaining 60 days, the total data points reach about 13.8 billion rows, which is impractical for relational databases. Therefore Prometheus uses a Time‑Series Database (TSDB) as its storage engine.
Storage Engine
TSDB as Prometheus' storage engine perfectly fits the monitoring data scenario.
Massive data volume
Mostly write operations
Writes are almost sequential, ordered by time
Writes rarely modify old data; data is written shortly after collection
Deletion is block‑based, removing whole time ranges
Data size usually exceeds memory; only a small, irregular subset is cached
Read operations are typical ascending or descending sequential reads
High‑concurrency reads are common
How does TSDB achieve this?
<code>"labels": [{"latency":"500"}]
"samples":[{"timestamp":1473305798,"value":0.9}]
</code>Raw data consists of two parts: labels (the monitoring dimensions) and samples (timestamp and value). Labels and metric name uniquely identify a time series (represented by a series_id); samples store the actual measurements.
<code>series
│ server{latency="500"}
│ server{latency="300"}
│ server{}
</code>TSDB stores values using a timeseries key and builds three indexes—Series, Label Index, and Time Index—to accelerate common label‑and‑time‑range queries.
Series
One part stores all label key‑value pairs in lexical order; the other part stores an index of time windows pointing to data blocks, allowing queries to skip irrelevant records.
Label Index
Each label is stored under an index:label: key, containing a list of all its values and references to the starting position of the corresponding series.
Time Index
Data is stored under an index:timeseries:: key that points to the file containing the data for a specific time interval.
Data Computation
The powerful storage engine enables Prometheus to perform matrix operations on metric series using basic operators and a rich set of functions, effectively turning the monitoring system into a data‑warehouse‑plus‑computation platform.
One Calculation, Many Queries
Because such computation can be resource‑intensive, pre‑computing results (recording rules) is often faster than evaluating the original expression each time, especially for dashboards and alerting rules that repeatedly query the same expression.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.