A Visual Guide to Prometheus: Architecture, Metrics Collection, Exporters, PromQL, and Alerting
This article visually explains Prometheus by covering its architecture, key features, metric collection methods, exporter role, PromQL query language, and alerting mechanisms, helping readers understand how to monitor cloud‑native systems effectively in production.
This article uses diagrams to dissect the principles of Prometheus.
1. What Is Prometheus?
Most readers are familiar with the ELK Stack (Elasticsearch + Filebeat + Logstash + Kibana) for log collection and retrieval.
Prometheus can be thought of as an ELK‑like system, but it is not designed for massive log storage or long‑term retention (default retention is 15 days). Its strength lies in showing recent trend data and providing a powerful alerting mechanism. Below is the Prometheus architecture diagram:
Prometheus pulls real‑time time‑series data from applications and uses a robust rule engine to help you identify the information needed for monitoring.
As a metric‑based system, it is not suitable for storing events or logs; it focuses on trend monitoring. For precise data, consider ELK or other log solutions.
Prometheus Features
Open‑source monitoring tool.
Based on a time‑series database (TSDB) implemented in Go.
Originally developed by SoundCloud, derived from Google’s Borgmon.
Multi‑dimensional labeling.
Pull‑based data collection.
Supports both white‑box and black‑box monitoring, DevOps‑friendly.
Metrics & Alert model (not logging/tracing).
Rich community ecosystem with many exporters.
High single‑node performance: can ingest millions of time‑series and support thousands of targets.
Prometheus Limitations
Prometheus focuses on performance and availability monitoring and is not suitable for logs, events, or tracing. It retains data for a short period (default 15 days).
2. Prometheus Metric Collection
The Prometheus Web UI shows Targets and Endpoints, indicating which services can be scraped.
Endpoint: the source from which metrics can be scraped.
Target: includes endpoint address, port status, etc.
Example scrape configuration:
- job_name: mysqld
static_configs:
- targets: ['192.168.0.100:9104']
labels:
instance: mysql-exporterJob: a group of similar targets.
Instance: the exporter process running on a host.
Scraped metrics are stored as time‑series on the Prometheus server and can optionally be forwarded to external storage.
3. Prometheus Collection Methods
Prometheus can collect data via direct collection (instrumented applications expose metrics) or indirect collection (using exporters for black‑box systems).
Direct collection is used by services that already expose Prometheus metrics (e.g., etcd, Kubernetes, Docker). Indirect collection relies on exporters for systems like OS, Redis, MySQL.
4. Exporter Monitoring Programs
Exporters act as side‑cars or agents that expose metrics from black‑box systems to Prometheus.
Common exporters include node‑exporter for OS metrics and mysql‑exporter for MySQL metrics. Exporters convert collected data into a text format and provide an HTTP endpoint for Prometheus to scrape.
https://prometheus.io/docs/instrumenting/exporters/5. PromQL
PromQL (Prometheus Query Language) is a powerful DSL for selecting and aggregating time‑series data, similar to SQL.
It enables complex aggregations, calculations, and analysis directly in the Prometheus UI, Grafana, or via API clients.
6. Monitoring Alerts
Sending Alerts
When an alert rule fires, Prometheus sends the alert to the separate Alertmanager component, which then routes the alert to receivers such as email, DingTalk, etc.
Metrics collection and alerting are decoupled in Prometheus.
Prometheus evaluates alert rules periodically; if conditions are met, an alert is generated and sent to Alertmanager.
Alertmanager groups, routes, and delivers alerts to configured receivers.
7. Summary
Through visual diagrams, this article introduced Prometheus’s advantages and disadvantages, metric collection, collection methods, exporters, PromQL, and alerting, providing insights for building cloud‑native monitoring solutions.
Wukong Talks Architecture
Explaining distributed systems and architecture through stories. Author of the "JVM Performance Tuning in Practice" column, open-source author of "Spring Cloud in Practice PassJava", and independently developed a PMP practice quiz mini-program.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.