Prometheus Monitoring in Kubernetes: Principles, Exporters, Configuration, Capacity Planning, and Best Practices
This comprehensive guide explores Prometheus as a cloud‑native monitoring solution for Kubernetes, covering core principles, exporter selection, configuration snippets, Grafana dashboard creation, capacity planning, high‑cardinality challenges, rate calculations, prediction functions, high‑availability designs, and integration with Alertmanager and other operational tools.
Prometheus, a new‑generation open‑source monitoring system, has become the de‑facto standard in cloud‑native environments; this article shares practical issues, design principles, and considerations for using Prometheus with Kubernetes.
Key Principles
Monitoring is infrastructure; avoid unnecessary metric collection that wastes resources.
Only emit alerts that can be acted upon.
Keep the architecture simple and reliable; avoid magic systems like AI‑driven auto‑remediation.
Prometheus Limitations
Metric‑based only – not suitable for logs, events, or tracing.
Default pull model; plan network topology to avoid forwarding.
No silver bullet for clustering – choose between Federate, Cortex, Thanos, etc.
Availability > consistency; occasional data loss is tolerated for successful queries.
Functions like rate and histogram_quantile can produce unintuitive results; long‑range queries cause down‑sampling.
Kubernetes Exporters
cAdvisor (built‑in Kubelet)
kubelet (ports 10255/10250)
apiserver (port 6443)
scheduler, controller‑manager, etcd, Docker, kube‑proxy, kube‑state‑metrics, node‑exporter, blackbox_exporter, process‑exporter, NVIDIA exporter, custom application exporters, etc.
These exporters provide metrics for core components, which can be visualized in Grafana dashboards (see images in the original article).
Grafana Panels
Grafana can render dashboards for kubelet, apiserver, and other components; templates simplify multi‑level dropdowns, though template‑based alert rules are not yet supported.
All‑in‑One Exporter Collection
Two approaches are suggested: launching N exporter processes from a main process, or using Telegraf to handle multiple inputs.
Golden Metrics
Follow Google SRE’s four golden signals (latency, traffic, errors, saturation) and use the Use (Utilization, Saturation, Errors) or Red (Rate, Errors, Duration) methods for online and offline services.
Kubernetes 1.16 cAdvisor Compatibility
cAdvisor label changes require relabeling to preserve original _name labels.
Metric Relabel Config Example
metric_relabel_configs:
- source_labels: [container]
regex: (.+)
target_label: container_name
replacement: $1
action: replace
- source_labels: [pod]
regex: (.+)
target_label: pod_name
replacement: $1
action: replaceExternal Cluster Scraping
When Prometheus runs outside a cluster, use bearer tokens and TLS settings; example job for cAdvisor:
- job_name: cluster-cadvisor
honor_timestamps: true
scrape_interval: 30s
scrape_timeout: 10s
metrics_path: /metrics
scheme: https
kubernetes_sd_configs:
- api_server: https://xx:6443
role: node
bearer_token_file: token/cluster.token
tls_config:
insecure_skip_verify: true
relabel_configs:
- separator: ;
regex: __meta_kubernetes_node_label_(.+)
replacement: $1
action: labelmap
- separator: ;
regex: (.*)
target_label: __address__
replacement: xx:6443
action: replace
- source_labels: [__meta_kubernetes_node_name]
separator: ;
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
action: replace
metric_relabel_configs:
- source_labels: [container]
separator: ;
regex: (.+)
target_label: container_name
replacement: $1
action: replace
- source_labels: [pod]
separator: ;
regex: (.+)
target_label: pod_name
replacement: $1
action: replaceSimilar job for generic service endpoints is provided in the source.
Prometheus Time‑Zone
Prometheus stores timestamps in UTC; Grafana can convert to local time, and newer UI versions include a timezone option.
Load Balancer to ReplicaSet Metrics
Use sidecar proxies or configure LB to forward to backend services for metric collection.
Version Selection
Prefer the latest Prometheus version (e.g., 2.16) for new UI and performance improvements.
Memory Consumption
Memory usage grows with ingestion rate and retention; recommendations include reducing series count, increasing scrape intervals, and using remote‑write solutions like Thanos or Victoriametrics.
rate(prometheus_tsdb_compaction_chunk_size_bytes_sum[1h]) / rate(prometheus_tsdb_compaction_chunk_samples_sum[1h]){instance="0.0.0.0:8890",job="prometheus"} 1.252747585939941
Capacity planning formula:
disk_size = retention_time_seconds * samples_per_second * bytes_per_sampleHigh cardinality metrics and labels should be avoided; examples of top‑cardinality metrics and labels are shown.
Rate Calculation
Rate handles counter resets; use a range at least four times the scrape interval for stability.
P95 and Histogram Quantile
Explain why P95 may be higher or lower than the average and how bucket design affects results.
Slow Query Detection
Identify slow PromQL queries using prometheus_engine_query_duration_seconds and avoid large range queries with small steps.
High Cardinality Issues
Labels with unbounded values (e.g., user IDs, IPs) should not be used as metric labels.
Prometheus Restart and Hot Reload
Restarting reloads WAL data; enable web.enable-lifecycle for hot reloads. Example reload script:
#!/bin/sh
FILE=$1
URL=$2
HASH=$(md5sum $(readlink -f $FILE))
while true; do
NEW_HASH=$(md5sum $(readlink -f $FILE))
if [ "$HASH" != "$NEW_HASH" ]; then
HASH="$NEW_HASH"
echo "[$(date +%s)] Trigger refresh"
curl -sSL -X POST "$URL" > /dev/null
fi
sleep 5
doneUse it with Prometheus or Alertmanager reload arguments.
Application Metric Design
Keep metric count reasonable (e.g., < 10 k) and control label cardinality.
node‑exporter Issues
Does not monitor processes; use process‑exporter or Telegraf.
Only supports Unix; use wmi_exporter on Windows.
Prefer newer versions (0.16/0.17) for naming conventions.
Metric name changes are listed (e.g., node_cpu_seconds_total ).
kube‑state‑metrics
Provides metadata for enriching cAdvisor metrics; does not expose annotations due to high cardinality.
Relabel vs Metric Relabel
Relabel runs before scraping; metric_relabel runs after. Example:
metric_relabel_configs:
- separator: ;
regex: instance
replacement: $1
action: labeldropPrediction Functions
Use predict_linear or deriv to forecast future values, e.g., free memory:
predict_linear(mem_free{instanceIP="100.75.155.55"}[1h], 2*3600)/1024/1024Alert when predicted value falls below a threshold:
rule: predict_linear(mem_free{instanceIP="100.75.155.55"}[1h], 2*3600)/1024/1024 < 10Deriv equivalent:
deriv(mem_free{instanceIP="100.75.155.55"}[1h]) * 2 * 3600 + mem_free{instanceIP="100.75.155.55"}[1h]Alertmanager Wrappers
Provide UI‑based configuration for business users, integrate with internal webhook, and manage alert templates and routing.
High‑Availability Design Mistakes
Using a message queue to push metrics adds latency, synchronization issues, and removes service discovery benefits.
Prometheus‑Operator
Offers CRD‑based configuration and Grafana templates but hides low‑level details; users should understand Prometheus fundamentals.
HA Solutions
Basic HA with load balancer.
HA + remote‑write storage.
Federation (sharding).
Thanos or Victoriametrics for global query and deduplication.
Discusses storage and query side solutions, including remote‑write adapters and sidecar approaches.
Logs and Events
Logs should be collected by EFK stacks; metrics can be derived from logs using mtail or grok. Kubernetes events need persistence via event‑exporter or conversion to metrics.
References
https://yasongxu.gitbook.io/container-monitor/
https://prometheus.io/docs/instrumenting/exporters/
https://povilasv.me/grafana-dashboards-for-kubernetes-administrators/
https://github.com/grafana/grafana/issues/9334
https://zhangguanzhang.github.io/2019/09/05/prometheus-change-timezone/
https://github.com/prometheus/prometheus/issues/500
https://www.robustperception.io/how-much-ram-does-prometheus-2-x-need-for-cardinality-and-ingestion
https://www.youtube.com/watch?v=67Ulrq6DxwA
https://github.com/nfrumkin/forecast-prometheus
https://blog.timescale.com/blog/prometheus-ha-postgresql-8de68d19b6f5
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.