Operations 38 min read

Prometheus Monitoring in Kubernetes: Principles, Exporters, Configuration, Capacity Planning, and Best Practices

This comprehensive guide explores Prometheus as a cloud‑native monitoring solution for Kubernetes, covering core principles, exporter selection, configuration snippets, Grafana dashboard creation, capacity planning, high‑cardinality challenges, rate calculations, prediction functions, high‑availability designs, and integration with Alertmanager and other operational tools.

Top Architect
Top Architect
Top Architect
Prometheus Monitoring in Kubernetes: Principles, Exporters, Configuration, Capacity Planning, and Best Practices

Prometheus, a new‑generation open‑source monitoring system, has become the de‑facto standard in cloud‑native environments; this article shares practical issues, design principles, and considerations for using Prometheus with Kubernetes.

Key Principles

Monitoring is infrastructure; avoid unnecessary metric collection that wastes resources.

Only emit alerts that can be acted upon.

Keep the architecture simple and reliable; avoid magic systems like AI‑driven auto‑remediation.

Prometheus Limitations

Metric‑based only – not suitable for logs, events, or tracing.

Default pull model; plan network topology to avoid forwarding.

No silver bullet for clustering – choose between Federate, Cortex, Thanos, etc.

Availability > consistency; occasional data loss is tolerated for successful queries.

Functions like rate and histogram_quantile can produce unintuitive results; long‑range queries cause down‑sampling.

Kubernetes Exporters

cAdvisor (built‑in Kubelet)

kubelet (ports 10255/10250)

apiserver (port 6443)

scheduler, controller‑manager, etcd, Docker, kube‑proxy, kube‑state‑metrics, node‑exporter, blackbox_exporter, process‑exporter, NVIDIA exporter, custom application exporters, etc.

These exporters provide metrics for core components, which can be visualized in Grafana dashboards (see images in the original article).

Grafana Panels

Grafana can render dashboards for kubelet, apiserver, and other components; templates simplify multi‑level dropdowns, though template‑based alert rules are not yet supported.

All‑in‑One Exporter Collection

Two approaches are suggested: launching N exporter processes from a main process, or using Telegraf to handle multiple inputs.

Golden Metrics

Follow Google SRE’s four golden signals (latency, traffic, errors, saturation) and use the Use (Utilization, Saturation, Errors) or Red (Rate, Errors, Duration) methods for online and offline services.

Kubernetes 1.16 cAdvisor Compatibility

cAdvisor label changes require relabeling to preserve original _name labels.

Metric Relabel Config Example

metric_relabel_configs:
- source_labels: [container]
  regex: (.+)
  target_label: container_name
  replacement: $1
  action: replace
- source_labels: [pod]
  regex: (.+)
  target_label: pod_name
  replacement: $1
  action: replace

External Cluster Scraping

When Prometheus runs outside a cluster, use bearer tokens and TLS settings; example job for cAdvisor:

- job_name: cluster-cadvisor
  honor_timestamps: true
  scrape_interval: 30s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: https
  kubernetes_sd_configs:
  - api_server: https://xx:6443
    role: node
    bearer_token_file: token/cluster.token
    tls_config:
      insecure_skip_verify: true
  relabel_configs:
  - separator: ;
    regex: __meta_kubernetes_node_label_(.+)
    replacement: $1
    action: labelmap
  - separator: ;
    regex: (.*)
    target_label: __address__
    replacement: xx:6443
    action: replace
  - source_labels: [__meta_kubernetes_node_name]
    separator: ;
    regex: (.+)
    target_label: __metrics_path__
    replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
    action: replace
  metric_relabel_configs:
  - source_labels: [container]
    separator: ;
    regex: (.+)
    target_label: container_name
    replacement: $1
    action: replace
  - source_labels: [pod]
    separator: ;
    regex: (.+)
    target_label: pod_name
    replacement: $1
    action: replace

Similar job for generic service endpoints is provided in the source.

Prometheus Time‑Zone

Prometheus stores timestamps in UTC; Grafana can convert to local time, and newer UI versions include a timezone option.

Load Balancer to ReplicaSet Metrics

Use sidecar proxies or configure LB to forward to backend services for metric collection.

Version Selection

Prefer the latest Prometheus version (e.g., 2.16) for new UI and performance improvements.

Memory Consumption

Memory usage grows with ingestion rate and retention; recommendations include reducing series count, increasing scrape intervals, and using remote‑write solutions like Thanos or Victoriametrics.

rate(prometheus_tsdb_compaction_chunk_size_bytes_sum[1h]) / rate(prometheus_tsdb_compaction_chunk_samples_sum[1h]){instance="0.0.0.0:8890",job="prometheus"} 1.252747585939941

Capacity planning formula:

disk_size = retention_time_seconds * samples_per_second * bytes_per_sample

High cardinality metrics and labels should be avoided; examples of top‑cardinality metrics and labels are shown.

Rate Calculation

Rate handles counter resets; use a range at least four times the scrape interval for stability.

P95 and Histogram Quantile

Explain why P95 may be higher or lower than the average and how bucket design affects results.

Slow Query Detection

Identify slow PromQL queries using prometheus_engine_query_duration_seconds and avoid large range queries with small steps.

High Cardinality Issues

Labels with unbounded values (e.g., user IDs, IPs) should not be used as metric labels.

Prometheus Restart and Hot Reload

Restarting reloads WAL data; enable web.enable-lifecycle for hot reloads. Example reload script:

#!/bin/sh
FILE=$1
URL=$2
HASH=$(md5sum $(readlink -f $FILE))
while true; do
  NEW_HASH=$(md5sum $(readlink -f $FILE))
  if [ "$HASH" != "$NEW_HASH" ]; then
    HASH="$NEW_HASH"
    echo "[$(date +%s)] Trigger refresh"
    curl -sSL -X POST "$URL" > /dev/null
  fi
  sleep 5
done

Use it with Prometheus or Alertmanager reload arguments.

Application Metric Design

Keep metric count reasonable (e.g., < 10 k) and control label cardinality.

node‑exporter Issues

Does not monitor processes; use process‑exporter or Telegraf.

Only supports Unix; use wmi_exporter on Windows.

Prefer newer versions (0.16/0.17) for naming conventions.

Metric name changes are listed (e.g., node_cpu_seconds_total ).

kube‑state‑metrics

Provides metadata for enriching cAdvisor metrics; does not expose annotations due to high cardinality.

Relabel vs Metric Relabel

Relabel runs before scraping; metric_relabel runs after. Example:

metric_relabel_configs:
  - separator: ;
    regex: instance
    replacement: $1
    action: labeldrop

Prediction Functions

Use predict_linear or deriv to forecast future values, e.g., free memory:

predict_linear(mem_free{instanceIP="100.75.155.55"}[1h], 2*3600)/1024/1024

Alert when predicted value falls below a threshold:

rule: predict_linear(mem_free{instanceIP="100.75.155.55"}[1h], 2*3600)/1024/1024 < 10

Deriv equivalent:

deriv(mem_free{instanceIP="100.75.155.55"}[1h]) * 2 * 3600 + mem_free{instanceIP="100.75.155.55"}[1h]

Alertmanager Wrappers

Provide UI‑based configuration for business users, integrate with internal webhook, and manage alert templates and routing.

High‑Availability Design Mistakes

Using a message queue to push metrics adds latency, synchronization issues, and removes service discovery benefits.

Prometheus‑Operator

Offers CRD‑based configuration and Grafana templates but hides low‑level details; users should understand Prometheus fundamentals.

HA Solutions

Basic HA with load balancer.

HA + remote‑write storage.

Federation (sharding).

Thanos or Victoriametrics for global query and deduplication.

Discusses storage and query side solutions, including remote‑write adapters and sidecar approaches.

Logs and Events

Logs should be collected by EFK stacks; metrics can be derived from logs using mtail or grok. Kubernetes events need persistence via event‑exporter or conversion to metrics.

References

https://yasongxu.gitbook.io/container-monitor/

https://prometheus.io/docs/instrumenting/exporters/

https://povilasv.me/grafana-dashboards-for-kubernetes-administrators/

https://github.com/grafana/grafana/issues/9334

https://zhangguanzhang.github.io/2019/09/05/prometheus-change-timezone/

https://github.com/prometheus/prometheus/issues/500

https://www.robustperception.io/how-much-ram-does-prometheus-2-x-need-for-cardinality-and-ingestion

https://www.youtube.com/watch?v=67Ulrq6DxwA

https://github.com/nfrumkin/forecast-prometheus

https://blog.timescale.com/blog/prometheus-ha-postgresql-8de68d19b6f5

monitoringHigh AvailabilityKubernetesPrometheusGrafanaAlertmanagerExporter
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.