Comprehensive Guide to Prometheus: Architecture, Metric Collection, Querying, Exporting, and Visualization
This article provides a detailed overview of Prometheus, covering its architecture, metric exposure and scraping models, data model, metric types, configuration reload, PromQL query language, custom exporters, Grafana integration, and Alertmanager alerting, with practical code examples and best‑practice tips.
Introduction
Prometheus is an open‑source, CNCF‑graduated monitoring solution that offers metric collection, storage, visualization, and alerting for modern cloud‑native systems.
Overall Architecture
Prometheus works through a pipeline: services expose metrics (Job), Prometheus scrapes them (Pull model or via PushGateway), stores them in a built‑in time‑series database, and provides a web UI and API for querying and alerting.
Metric Exposure
Each monitored service is a Job . Metrics can be exported using official SDKs, community exporters (e.g., MySQL, Consul), or a PushGateway for short‑lived jobs.
Scrape Models
Pull model : Prometheus actively pulls /metrics endpoints at a configurable interval (default 15s).
Push model : Services push metrics to a PushGateway , which Prometheus then pulls.
Service Registration
Registration can be static (hard‑coded IP/port) or dynamic (service discovery). Example static config:
scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]Example dynamic config using Consul:
- job_name: "node_export_consul"
metrics_path: "/node_metrics"
scheme: "http"
consul_sd_configs:
- server: "localhost:8500"
services:
- "node_exporter"Configuration Reload
Start Prometheus with --web.enable-lifecycle . Then reload configuration via:
prometheus --config.file=/etc/prometheus/prometheus.yml --web.enable-lifecycle curl -X POST http://localhost:9090/-/reloadThe internal handler posts a signal to a reload channel, which the main loop consumes to apply the new config.
Data Model
Each time series consists of a metric name with label set, a timestamp (milliseconds), and a sample value.
Example metric line format:
# HELP http_requests_total Total number of HTTP requests
# TYPE http_requests_total counter
http_requests_total{method="GET",code="200"} 1027Metric Types
Counter : monotonically increasing (e.g., request count).
Gauge : can go up or down (e.g., memory usage).
Histogram : bucketed counts for distribution analysis.
Summary : pre‑computed quantiles.
Exporters and Custom Exporter Example
Use community exporters or write a custom exporter with the Go client library:
package main
import (
"net/http"
"github.com/prometheus/client_golang/prometheus/promhttp"
)
func main() {
http.Handle("/metrics", promhttp.Handler())
http.ListenAndServe(":8080", nil)
}Adding a custom counter:
myCounter := prometheus.NewCounter(prometheus.CounterOpts{
Name: "my_counter_total",
Help: "custom counter",
})
prometheus.MustRegister(myCounter)
myCounter.Add(23)Registering a counter with labels:
myCounterVec := prometheus.NewCounterVec(
prometheus.CounterOpts{Name: "my_counter_total", Help: "custom counter"},
[]string{"label1", "label2"},
)
myCounterVec.With(prometheus.Labels{"label1": "1", "label2": "2"}).Inc()PromQL
PromQL supports four expression types: string, scalar, instant vector, and range vector.
Instant Query
go_gc_duration_seconds_count
go_gc_duration_seconds_count{instance="127.0.0.1:9600"}
go_gc_duration_seconds_count{instance=~"localhost.*"}Range Query
go_gc_duration_seconds_count[5m]
go_gc_duration_seconds_count[5m] offset 1dFunctions
rate() – average increase per second over a range.
irate() – instantaneous increase using the last two points.
histogram_quantile(0.5, my_histogram_bucket) – estimate quantiles.
sum() by() and sum() without() – aggregation with or without grouping.
Grafana Visualization
Connect Grafana to Prometheus as a data source, create dashboards, and use PromQL queries in panels to visualize metrics.
Alerting with Alertmanager
Define alert rules in alert_rules.yml :
groups:
- name: simulator-alert-rule
rules:
- alert: HttpSimulatorDown
expr: sum(up{job="http_srv"}) == 0
for: 1m
labels:
severity: criticalConfigure Alertmanager in prometheus.yml and set up email or other receivers. Alerts transition from PENDING to FIRING after the for duration, then are dispatched.
References
Prometheus official documentation
client_golang library
Grafana integration guides
Alertmanager configuration examples
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.