Comprehensive Guide to Prometheus: Overview, Installation, Configuration, PromQL, Exporters, Grafana Integration, and Alerting
This article provides a detailed introduction to Prometheus, covering its history, core features, installation methods, configuration file structure, PromQL basics, various exporters, Grafana visualization, alerting with Alertmanager, service discovery, and best‑practice recommendations for building a production‑grade monitoring system.
1. Overview of Prometheus
Prometheus is an open‑source monitoring and alerting system built around a time‑series database. It was created at SoundCloud in 2012 by Matt Proud, who drew inspiration from Google’s Borg and Borgmon. The project later joined CNCF in 2016 as the second hosted project after Kubernetes.
2. Core Characteristics
Multi‑dimensional data model
Operational simplicity (stand‑alone server)
Scalable data collection with a decentralized architecture
Powerful query language (PromQL) for alerting and graphing
These traits make Prometheus both a monitoring system and a time‑series database.
3. Installation
3.1 Binary ("out‑of‑the‑box")
$ wget https://github.com/prometheus/prometheus/releases/download/v2.4.3/prometheus-2.4.3.linux-amd64.tar.gz
$ tar xvfz prometheus-2.4.3.linux-amd64.tar.gz
$ cd prometheus-2.4.3.linux-amd64
$ ./prometheus --versionRun the server:
$ ./prometheus --config.file=prometheus.yml3.2 Docker
$ sudo docker run -d -p 9090:9090 prom/prometheusWith a custom configuration file:
$ sudo docker run -d -p 9090:9090 \
-v ~/docker/prometheus:/etc/prometheus \
prom/prometheus4. Configuration File (prometheus.yml)
# my global config
global:
scrape_interval: 15s
evaluation_interval: 15s
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Rule files
rule_files:
# - "first_rules.yml"
# Scrape configuration
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']The file is divided into four blocks: global , alerting , rule_files , and scrape_configs .
5. PromQL Basics
PromQL is the query language used to retrieve and aggregate metrics.
# List all targets' health status
up
# Filter by job label
up{job="prometheus"}
# Range vector (last 5 minutes)
http_requests_total[5m]
# Rate functions
rate(http_requests_total[5m])
irate(http_requests_total[5m])Supported aggregation operators include sum , count , max , min , topk , etc.
6. Exporters
6.1 Node Exporter (system metrics)
$ wget https://github.com/prometheus/node_exporter/releases/download/v0.16.0/node_exporter-0.16.0.linux-amd64.tar.gz
$ tar xvfz node_exporter-0.16.0.linux-amd64.tar.gz
$ cd node_exporter-0.16.0.linux-amd64
$ ./node_exporterMetrics are exposed on http://localhost:9100/metrics . Add the target to scrape_configs :
scrape_configs:
- job_name: 'server'
static_configs:
- targets: ['192.168.0.107:9100']6.2 MySQL Exporter
$ wget https://github.com/prometheus/mysqld_exporter/releases/download/v0.11.0/mysqld_exporter-0.11.0.linux-amd64.tar.gz
$ tar xvfz mysqld_exporter-0.11.0.linux-amd64.tar.gz
$ cd mysqld_exporter-0.11.0.linux-amd64
$ export DATA_SOURCE_NAME='root:123456@(192.168.0.107:3306)/'
$ ./mysqld_exporterAlternatively use a .my.cnf file.
6.3 Nginx Exporter
Two approaches: Lua‑based prometheus.lua or the nginx‑vts exporter that scrapes /status/format/json (or /status/format/prometheus in newer versions).
6.4 JMX Exporter
$ wget https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.3.1/jmx_prometheus_javaagent-0.3.1.jar
$ java -javaagent:jmx_prometheus_javaagent-0.3.1.jar=9404:config.yml -jar myapp.jarMetrics become available at http://localhost:9404/metrics .
7. Grafana Integration
Grafana provides rich dashboards for Prometheus data. Install via Docker:
$ docker run -d -p 3000:3000 grafana/grafanaConfigure a Prometheus data source (URL http://localhost:9090 ) and import dashboards such as ID 405 (Node Exporter Server Metrics).
8. Alerting with Alertmanager
8.1 Alert Rules
groups:
- name: example
rules:
- alert: InstanceDown
expr: up == 0
for: 5m
labels:
severity: page
annotations:
summary: "Instance {{ $labels.instance }} down"
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."
- alert: APIHighRequestLatency
expr: api_http_request_latencies_second{quantile="0.5"} > 1
for: 10m
annotations:
summary: "High request latency on {{ $labels.instance }}"
description: "{{ $labels.instance }} has a median request latency above 1s (current value: {{ $value }}s)"8.2 Alertmanager Configuration
global:
resolve_timeout: 5m
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'web.hook'
receivers:
- name: 'web.hook'
webhook_configs:
- url: 'http://127.0.0.1:5001/'Supported receivers include email_config , slack_config , wechat_configs , and generic webhook_config .
9. Service Discovery & Pushgateway
Prometheus can discover targets via Kubernetes, Consul, DNS, file‑based SD, etc. For short‑lived jobs, use the Pushgateway to push metrics before they disappear.
10. Summary
The guide demonstrates that Prometheus, together with Grafana and Alertmanager, forms a complete cloud‑native monitoring stack suitable for Kubernetes, Docker, and traditional server environments. Its pull‑based model, powerful PromQL, and extensive exporter ecosystem make it the de‑facto solution for modern operations.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.