Operations 19 min read

Master Prometheus: From Basics to Advanced Monitoring, Alerting, and Grafana Integration

This comprehensive guide explains Prometheus fundamentals, its ecosystem, metric collection models, configuration, PromQL querying, custom exporters, Grafana visualization, and Alertmanager setup, providing step‑by‑step instructions and code examples for effective system monitoring and alerting.

Efficient Ops
Efficient Ops
Efficient Ops
Master Prometheus: From Basics to Advanced Monitoring, Alerting, and Grafana Integration

Introduction

Prometheus is an open‑source monitoring solution that collects and stores time‑series metrics, providing insight into system health.

Ecosystem

Prometheus includes components for exposing metrics, scraping, storage, visualization and alerting.

Metrics collection

Each monitored service is a Job with Targets. Metrics can be exported via SDKs or exporters (MySQL, Consul, etc.). PushGateway is used for short‑lived jobs.

Pull model

Prometheus regularly pulls metrics from the

/metrics

endpoint; the interval is configured with

scrape_interval

.

Push model

PushGateway allows services to push metrics which Prometheus then pulls.

Storage and query

Metrics are stored in an internal TSDB and queried with PromQL via the web UI or Grafana.

Alerting

Alertmanager handles alerts generated from PromQL expressions and can route them to email, WeChat, etc.

How it works

Service registration (static or dynamic), configuration reload, and metric scraping flow are illustrated.

Static registration

<code>scrape_configs:
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]
</code>

Dynamic registration

<code>- job_name: "node_export_consul"
  metrics_path: /node_metrics
  scheme: http
  consul_sd_configs:
    - server: localhost:8500
      services:
        - node_exporter
</code>

After editing the config, reload with

--web.enable-lifecycle

and POST to

/-/reload

.

Metric model

Each time‑series consists of metric name with labels, timestamp, and value. Types include Counter, Gauge, Histogram, Summary.

Counter

Monotonically increasing values such as request counts.

Gauge

Values that can go up and down, e.g., memory usage.

Histogram and Summary

Statistical distributions; Histograms are bucketed and require client‑side bucket configuration.

PromQL

PromQL supports instant vectors, range vectors, and functions like

rate

,

irate

,

sum

with

by

/

without

, and

histogram_quantile

.

Grafana visualization

Connect Grafana to Prometheus as a data source, create dashboards, and write PromQL queries to visualize metrics.

Alerting configuration

Define alert rules in YAML, configure Alertmanager with receivers (e.g., email), and silence alerts via the UI.

monitoringmetricsalertingPrometheuspromqlGrafana
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.