Operations 22 min read

Comprehensive Guide to Prometheus: Architecture, Metric Collection, Querying, Exporting, and Visualization

This article provides a detailed overview of Prometheus, covering its architecture, metric exposure and scraping models, data model, metric types, configuration reload, PromQL query language, custom exporters, Grafana integration, and Alertmanager alerting, with practical code examples and best‑practice tips.

Architect
Architect
Architect
Comprehensive Guide to Prometheus: Architecture, Metric Collection, Querying, Exporting, and Visualization

Introduction

Prometheus is an open‑source, CNCF‑graduated monitoring solution that offers metric collection, storage, visualization, and alerting for modern cloud‑native systems.

Overall Architecture

Prometheus works through a pipeline: services expose metrics (Job), Prometheus scrapes them (Pull model or via PushGateway), stores them in a built‑in time‑series database, and provides a web UI and API for querying and alerting.

Metric Exposure

Each monitored service is a Job . Metrics can be exported using official SDKs, community exporters (e.g., MySQL, Consul), or a PushGateway for short‑lived jobs.

Scrape Models

Pull model : Prometheus actively pulls /metrics endpoints at a configurable interval (default 15s).

Push model : Services push metrics to a PushGateway , which Prometheus then pulls.

Service Registration

Registration can be static (hard‑coded IP/port) or dynamic (service discovery). Example static config:

scrape_configs:
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]

Example dynamic config using Consul:

- job_name: "node_export_consul"
  metrics_path: "/node_metrics"
  scheme: "http"
  consul_sd_configs:
    - server: "localhost:8500"
      services:
        - "node_exporter"

Configuration Reload

Start Prometheus with --web.enable-lifecycle . Then reload configuration via:

prometheus --config.file=/etc/prometheus/prometheus.yml --web.enable-lifecycle
curl -X POST http://localhost:9090/-/reload

The internal handler posts a signal to a reload channel, which the main loop consumes to apply the new config.

Data Model

Each time series consists of a metric name with label set, a timestamp (milliseconds), and a sample value.

Example metric line format:

# HELP http_requests_total Total number of HTTP requests
# TYPE http_requests_total counter
http_requests_total{method="GET",code="200"} 1027

Metric Types

Counter : monotonically increasing (e.g., request count).

Gauge : can go up or down (e.g., memory usage).

Histogram : bucketed counts for distribution analysis.

Summary : pre‑computed quantiles.

Exporters and Custom Exporter Example

Use community exporters or write a custom exporter with the Go client library:

package main

import (
    "net/http"
    "github.com/prometheus/client_golang/prometheus/promhttp"
)

func main() {
    http.Handle("/metrics", promhttp.Handler())
    http.ListenAndServe(":8080", nil)
}

Adding a custom counter:

myCounter := prometheus.NewCounter(prometheus.CounterOpts{
    Name: "my_counter_total",
    Help: "custom counter",
})
prometheus.MustRegister(myCounter)
myCounter.Add(23)

Registering a counter with labels:

myCounterVec := prometheus.NewCounterVec(
    prometheus.CounterOpts{Name: "my_counter_total", Help: "custom counter"},
    []string{"label1", "label2"},
)
myCounterVec.With(prometheus.Labels{"label1": "1", "label2": "2"}).Inc()

PromQL

PromQL supports four expression types: string, scalar, instant vector, and range vector.

Instant Query

go_gc_duration_seconds_count
go_gc_duration_seconds_count{instance="127.0.0.1:9600"}
go_gc_duration_seconds_count{instance=~"localhost.*"}

Range Query

go_gc_duration_seconds_count[5m]
go_gc_duration_seconds_count[5m] offset 1d

Functions

rate() – average increase per second over a range.

irate() – instantaneous increase using the last two points.

histogram_quantile(0.5, my_histogram_bucket) – estimate quantiles.

sum() by() and sum() without() – aggregation with or without grouping.

Grafana Visualization

Connect Grafana to Prometheus as a data source, create dashboards, and use PromQL queries in panels to visualize metrics.

Alerting with Alertmanager

Define alert rules in alert_rules.yml :

groups:
  - name: simulator-alert-rule
    rules:
      - alert: HttpSimulatorDown
        expr: sum(up{job="http_srv"}) == 0
        for: 1m
        labels:
          severity: critical

Configure Alertmanager in prometheus.yml and set up email or other receivers. Alerts transition from PENDING to FIRING after the for duration, then are dispatched.

References

Prometheus official documentation

client_golang library

Grafana integration guides

Alertmanager configuration examples

monitoringmetricsAlertingPrometheusExporterspromqlGrafana
Architect
Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.