How to Build a Scalable Prometheus Monitoring System with Thanos on Kubernetes
This article explains why monitoring is essential for production stability, compares white‑box and black‑box approaches, and provides a step‑by‑step guide to deploying Prometheus, configuring scrape targets, using Pushgateway and Alertmanager, and scaling the solution with Thanos in a Kubernetes environment.
Monitoring is a fundamental part of infrastructure that ensures the stability of production services; it helps detect, locate, and resolve issues through alerts and, in some cases, automated self‑healing.
White‑box vs. Black‑box Monitoring
White‑box monitoring observes internal metrics such as request count, success/failure rates, and average latency, while black‑box monitoring uses probes to verify external availability, catching problems like DNS failures that white‑box metrics miss.
Why Choose Prometheus
Prometheus was selected for its flexible PromQL query language, single‑binary deployment, Go‑based integration, built‑in Web UI, and rich ecosystem (Alertmanager, Pushgateway, Exporters).
Prometheus Architecture
Prometheus discovers targets via configuration, scrapes metrics from HTTP endpoints, stores them in a local TSDB, evaluates alert rules with PromQL, and sends alerts to Alertmanager, which can forward them to email or chat.
Metric Naming Conventions
Use base units (e.g.,
secondsnot
milliseconds).
Prefix metric names with the application namespace, e.g.,
process_cpu_seconds_total,
http_request_duration_seconds.
Use suffixes to describe units, e.g.,
node_memory_usage_bytes,
foobar_build_info.
Metric Types
Counter : monotonically increasing (e.g., request count).
Gauge : can go up or down (e.g., CPU usage).
Histogram and Summary : capture distributions for latency or size.
Time‑Series Basics
A time series is a
(timestamp, value)pair. Single‑dimensional series become vectors; adding labels (e.g.,
host="host1") creates multi‑dimensional series. Instant vectors represent a single point in time, while range vectors cover a time window.
PromQL Examples
<code>http_requests{host="host1",service="web",code="200",env="test"}</code>Instant vector result:
<code>http_requests{host="host1",service="web",code="200",env="test"} 10</code><code>http_requests{host="host2",service="web",code="200",env="test"} 0</code><code>http_requests{host="host3",service="web",code="200",env="test"} 12</code>Range vector query:
<code>http_requests{host="host1",service="web",code="200",env="test"}[:5m]</code>Calculate rate:
<code>rate(http_requests{host="host1",service="web",code="200",env="test"}[:5m])</code>Calculate increase:
<code>increase(http_requests{host="host1",service="web",code="200",env="test"}[:5m])</code>90th percentile of a histogram:
<code>histogram_quantile(0.9, rate(employee_age_bucket_bucket[10m]))</code>Cardinality and Storage
Each sample is stored in memory and flushed to disk every two hours. High cardinality (many label combinations) increases memory usage exponentially, so avoid using labels like user IP or request ID. Adjust
storage.tsdb.min-block-durationand
scrape_intervalto control memory pressure.
Service Discovery and Scrape Configs
Static configs work for a few targets, but dynamic environments benefit from service discovery (Kubernetes, Consul, file‑based). Prometheus can watch files for changes and update targets automatically.
Pushgateway
For short‑lived batch jobs, Pushgateway receives metrics pushed by the job, allowing Prometheus to scrape them later. It does not expire metrics automatically, so duplicate data can appear if multiple Pushgateways are behind a load balancer; careful label management (e.g.,
honor_labels: true) is required.
Alertmanager
Alertmanager receives alerts from Prometheus, deduplicates, groups, silences, and forwards them to notification channels such as email, WeChat, or DingTalk.
Scaling with Thanos
Thanos adds global query, high‑availability, and long‑term storage to Prometheus. The Querier aggregates results from multiple sidecars, de‑duplicates data, and supports federation. Remote Write can send data to external stores like M3DB, InfluxDB, or OpenTSDB.
Deploying Prometheus on Kubernetes
<code>apiVersion: apps/v1
kind: StatefulSet
metadata:
name: prometheus
spec:
serviceName: "prometheus"
replicas: 3
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
thanos-store-api: "true"
spec:
serviceAccountName: prometheus
containers:
- name: prometheus
image: prom/prometheus:v2.11.1
args:
- --config.file=/etc/prometheus-shared/prometheus.yml
- --web.enable-lifecycle
- --storage.tsdb.path=/data/prometheus
- --storage.tsdb.retention=2w
- --storage.tsdb.min-block-duration=2h
- --storage.tsdb.max-block-duration=2h
- --web.enable-admin-api
ports:
- name: http
containerPort: 9090
volumeMounts:
- name: prometheus-config-shared
mountPath: /etc/prometheus-shared
- name: prometheus-data
mountPath: /data/prometheus
livenessProbe:
httpGet:
path: /-/healthy
port: http
- name: watch
image: watch
args: ["-v", "-t", "-p=/etc/prometheus-shared", "curl", "-X", "POST", "--fail", "-o", "-", "-sS", "http://localhost:9090/-/reload"]
volumeMounts:
- name: prometheus-config-shared
mountPath: /etc/prometheus-shared
- name: thanos
image: improbable/thanos:v0.6.0
command: ["/bin/sh", "-c"]
args:
- PROM_ID=`echo $POD_NAME| rev | cut -d '-' -f1` /bin/thanos sidecar \
--prometheus.url=http://localhost:9090 \
--reloader.config-file=/etc/prometheus/prometheus.yml.tmpl \
--reloader.config-envsubst-file=/etc/prometheus-shared/prometheus.yml
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
ports:
- name: http-sidecar
containerPort: 10902
- name: grpc
containerPort: 10901
volumeMounts:
- name: prometheus-config
mountPath: /etc/prometheus
- name: prometheus-config-shared
mountPath: /etc/prometheus-shared</code>RBAC is required for Prometheus to read Kubernetes resources:
<code>apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: prometheus
rules:
- apiGroups: [""]
resources: ["services", "pods", "nodes", "nodes/proxy", "endpoints"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["create"]
- apiGroups: [""]
resources: ["configmaps"]
resourceNames: ["prometheus-config"]
verbs: ["get", "update", "delete"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: default
roleRef:
kind: ClusterRole
name: prometheus
apiGroup: ""</code>Deploying Thanos Components
<code>apiVersion: apps/v1
kind: Deployment
metadata:
name: thanos-query
spec:
replicas: 2
selector:
matchLabels:
app: thanos-query
template:
metadata:
labels:
app: thanos-query
spec:
containers:
- name: thanos-query
image: improbable/thanos:v0.6.0
args:
- query
- --log.level=debug
- --query.timeout=2m
- --query.max-concurrent=20
- --query.replica-label=replica
- --query.auto-downsampling
- --store=dnssrv+thanos-store-gateway.default.svc
- --store.sd-dns-interval=30s
ports:
- name: http
containerPort: 10902
- name: grpc
containerPort: 10901
livenessProbe:
httpGet:
path: /-/healthy
port: http</code>Similar Deployments exist for Thanos Store, Thanos Ruler, Pushgateway, and Alertmanager, each exposing the necessary ports and mounting configuration via ConfigMaps.
Finally, an Ingress routes traffic to Prometheus, Thanos Query, Alertmanager, and Grafana, completing the monitoring stack.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.