How to Build a Scalable Prometheus Monitoring System with Thanos on Kubernetes
This article explains why monitoring is essential for production stability, compares white‑box and black‑box approaches, details the advantages of Prometheus, walks through its architecture, metric types, query language, high‑availability strategies with Thanos, and provides practical Kubernetes deployment manifests and configuration tips.
Monitoring is a fundamental part of infrastructure that ensures the stability of production services; it helps detect, locate, and resolve issues quickly.
White‑box monitoring observes internal metrics such as request count, success rate, and latency, while black‑box monitoring uses probes to verify external availability, complementing white‑box data.
Prometheus was chosen for its flexible PromQL query language, single‑binary deployment, Go‑based integration, built‑in Web UI, and rich ecosystem (Alertmanager, Pushgateway, Exporters).
Prometheus scrapes targets defined in
scrape_configs, stores samples in a local TSDB, and evaluates alert rules via PromQL. Metric names must use ASCII characters and follow naming conventions (e.g.,
process_cpu_seconds_total,
http_request_duration_seconds).
Metrics are stored as time series (
(timestamp, value)). Single‑dimensional series become vectors; adding labels creates multi‑dimensional series. Queries return instant vectors or range vectors, which can be aggregated with functions like
rate(),
increase(), or
histogram_quantile().
<code>http_requests{host="host1",service="web",code="200",env="test"}</code>High availability can be achieved with federation, but it has limitations. Thanos provides a global query layer that aggregates data from multiple Prometheus instances via sidecars and stores data in object storage.
Thanos Querier receives requests, forwards them to sidecars, aggregates results, and executes PromQL queries, supporting deduplication and high‑availability.
Prometheus supports remote read/write to external storage (e.g., M3DB, InfluxDB, OpenTSDB) for durable, scalable retention.
Service discovery (Kubernetes, Consul, file‑based) replaces static target lists, allowing dynamic scaling of monitored instances.
Pushgateway collects metrics from short‑lived jobs and caches them for Prometheus to scrape, but it does not expire metrics automatically and can cause duplication in load‑balanced setups.
Alertmanager receives alerts from Prometheus, deduplicates, groups, silences, and forwards them to notification channels (e.g., email, WeChat, DingTalk).
Deploying Prometheus on Kubernetes involves a
StatefulSetwith containers for Prometheus, a watch sidecar that reloads configuration on changes, and a Thanos sidecar. Example manifest:
<code>apiVersion: apps/v1
kind: StatefulSet
metadata:
name: prometheus
labels:
app: prometheus
spec:
serviceName: "prometheus"
updateStrategy:
type: RollingUpdate
replicas: 3
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
thanos-store-api: "true"
spec:
serviceAccountName: prometheus
volumes:
- name: prometheus-config
configMap:
name: prometheus-config
- name: prometheus-data
hostPath:
path: /data/prometheus
- name: prometheus-config-shared
emptyDir: {}
containers:
- name: prometheus
image: prom/prometheus:v2.11.1
args:
- --config.file=/etc/prometheus-shared/prometheus.yml
- --web.enable-lifecycle
- --storage.tsdb.path=/data/prometheus
- --storage.tsdb.retention=2w
- --storage.tsdb.min-block-duration=2h
- --storage.tsdb.max-block-duration=2h
- --web.enable-admin-api
ports:
- name: http
containerPort: 9090
volumeMounts:
- name: prometheus-config-shared
mountPath: /etc/prometheus-shared
- name: prometheus-data
mountPath: /data/prometheus
livenessProbe:
httpGet:
path: /-/healthy
port: http
- name: watch
image: watch
args: ["-v", "-t", "-p=/etc/prometheus-shared", "curl", "-X", "POST", "--fail", "-o", "-", "-sS", "http://localhost:9090/-/reload"]
volumeMounts:
- name: prometheus-config-shared
mountPath: /etc/prometheus-shared
- name: thanos
image: improbable/thanos:v0.6.0
command: ["/bin/sh", "-c"]
args:
- PROM_ID=`echo $POD_NAME| rev | cut -d '-' -f1` /bin/thanos sidecar \
--prometheus.url=http://localhost:9090 \
--reloader.config-file=/etc/prometheus/prometheus.yml.tmpl \
--reloader.config-envsubst-file=/etc/prometheus-shared/prometheus.yml
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
ports:
- name: http-sidecar
containerPort: 10902
- name: grpc
containerPort: 10901
volumeMounts:
- name: prometheus-config
mountPath: /etc/prometheus
- name: prometheus-config-shared
mountPath: /etc/prometheus-shared</code>RBAC is required for Prometheus to access Kubernetes resources:
<code>apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: prometheus
rules:
- apiGroups: [""]
resources: ["services", "pods", "nodes", "nodes/proxy", "endpoints"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["create"]
- apiGroups: [""]
resources: ["configmaps"]
resourceNames: ["prometheus-config"]
verbs: ["get", "update", "delete"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: default
roleRef:
kind: ClusterRole
name: prometheus
apiGroup: ""</code>Thanos Querier deployment example:
<code>apiVersion: apps/v1
kind: Deployment
metadata:
name: thanos-query
labels:
app: thanos-query
spec:
replicas: 2
selector:
matchLabels:
app: thanos-query
template:
metadata:
labels:
app: thanos-query
spec:
containers:
- name: thanos-query
image: improbable/thanos:v0.6.0
args:
- query
- --log.level=debug
- --query.timeout=2m
- --query.max-concurrent=20
- --query.replica-label=replica
- --query.auto-downsampling
- --store=dnssrv+thanos-store-gateway.default.svc
- --store.sd-dns-interval=30s
ports:
- name: http
containerPort: 10902
- name: grpc
containerPort: 10901
livenessProbe:
httpGet:
path: /-/healthy
port: http</code>Pushgateway deployment:
<code>apiVersion: apps/v1
kind: Deployment
metadata:
name: pushgateway
labels:
app: pushgateway
spec:
replicas: 15
selector:
matchLabels:
app: pushgateway
template:
metadata:
labels:
app: pushgateway
spec:
containers:
- name: pushgateway
image: prom/pushgateway:v1.0.0
ports:
- name: http
containerPort: 9091
resources:
limits:
memory: 1Gi
requests:
memory: 512Mi</code>Alertmanager deployment:
<code>apiVersion: apps/v1
kind: Deployment
metadata:
name: alertmanager
spec:
replicas: 3
selector:
matchLabels:
app: alertmanager
template:
metadata:
labels:
app: alertmanager
spec:
containers:
- name: alertmanager
image: prom/alertmanager:latest
args:
- --web.route-prefix=/alertmanager
- --config.file=/etc/alertmanager/config.yml
- --storage.path=/alertmanager
- --cluster.listen-address=0.0.0.0:8001
- --cluster.peer=alertmanager-peers.default:8001
ports:
- name: alertmanager
containerPort: 9093
volumeMounts:
- name: alertmanager-config
mountPath: /etc/alertmanager
- name: alertmanager
mountPath: /alertmanager
volumes:
- name: alertmanager-config
configMap:
name: alertmanager-config
- name: alertmanager
emptyDir: {}</code>Ingress resources can expose Pushgateway, Prometheus, Thanos Query, Alertmanager, and Grafana via Nginx.
Accessing the Prometheus UI shows that monitoring nodes are healthy.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.