Master Prometheus Monitoring for Big Data on Kubernetes: Design & Alerting
This article explains how to design and implement a Prometheus‑based monitoring system for big‑data components running on Kubernetes, covering metric exposure methods, scrape configurations, exporter deployment, and dynamic alert rule management with Alertmanager.
Design Overview
The monitoring system for big‑data platforms must reliably scrape exposed metrics, analyze them, and generate alerts. Key questions include what to monitor, how metrics are exposed, how Prometheus scrapes them, and how alert rules are dynamically configured.
Monitoring Targets
All big‑data components run as pods in a Kubernetes cluster.
Metric Exposure Methods
Directly expose Prometheus metrics (pull).
Push metrics to a
pushgateway(push).
Use a custom exporter to convert other formats to Prometheus‑compatible metrics.
Some components, such as Flink on YARN, run inside YARN containers and therefore require the pushgateway approach; short‑lived components are also recommended to push metrics.
Scrape Configuration
Prometheus always pulls metrics from targets. Common scrape jobs include:
Native
Jobconfiguration.
PodMonitor(via Prometheus Operator) for pod‑level metrics.
ServiceMonitor(via Prometheus Operator) for service‑level metrics.
When running on Kubernetes,
PodMonitoris usually the simplest choice.
<code>annotations:
prometheus.io/scrape: "true"
prometheus.io/scheme: "http"
prometheus.io/path: "/metrics"
prometheus.io/port: "19091"
</code>The main selectors in
prometheus-prometheus.yamlare
serviceMonitorSelector,
podMonitorSelector,
ruleSelector, and
alertmanagers. The
kubernetes_sd_configwith relabeling can discover pods dynamically and rewrite labels before scraping.
<code>labels:
bigData.metrics.object: pod
annotations:
bigData.metrics/scrape: "true"
bigData.metrics/scheme: "https"
bigData.metrics/path: "/jmx"
bigData.metrics/port: "29871"
bigData.metrics/role: "hdfs-nn,common"
</code>Alert Design
Alert Flow
Service experiences an abnormal condition.
Prometheus generates an alert.
Alertmanager receives the alert.
Alertmanager processes the alert according to configured routing, grouping, and inhibition rules, then forwards it (e.g., via webhook, SMS, email).
Dynamic Alert Configuration
Alerting consists of two parts:
alertmanager: handling strategy (receivers, routing).
alertRule: concrete alert expressions.
Alertmanager Example
<code>global:
resolve_timeout: 5m
receivers:
- name: 'default'
- name: 'test.web.hook'
webhook_configs:
- url: 'http://alert-url'
route:
receiver: 'default'
group_wait: 30s
group_interval: 5m
repeat_interval: 2h
group_by: [groupId,instanceId]
routes:
- receiver: 'test.web.hook'
continue: true
match:
groupId: node-disk-usage
- receiver: 'test.web.hook'
continue: true
match:
groupId: kafka-topic-highstore
</code>AlertRule Example – Disk Usage
<code>apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: node-disk-usage
namespace: monitoring
spec:
groups:
- name: node-disk-usage
rules:
- alert: node-disk-usage
expr: 100*(1-node_filesystem_avail_bytes{mountpoint="${path}"}/node_filesystem_size_bytes{mountpoint="${path}"}) > ${thresholdValue}
for: 1m
labels:
groupId: node-disk-usage
userIds: super
receivers: SMS
annotations:
title: "Disk warning: node {{$labels.instance}} ${path} usage {{$value}}%"
content: "Disk warning: node {{$labels.instance}} ${path} usage {{$value}}%"
</code>AlertRule Example – Kafka Lag
<code>apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: kafka-topic-highstore-${uniqueName}
namespace: monitoring
spec:
groups:
- name: kafka-topic-highstore
rules:
- alert: kafka-topic-highstore-${uniqueName}
expr: sum(kafka_consumergroup_lag{exporterType="kafka",consumergroup="${consumergroup}"}) > ${thresholdValue}
for: 1m
labels:
groupId: kafka-topic-highstore
instanceId: ${uniqueName}
userIds: super
receivers: SMS
annotations:
title: "KAFKA warning: consumer group ${consumergroup} lag {{$value}}"
content: "KAFKA warning: consumer group ${consumergroup} lag {{$value}}"
</code>Alert Timing Example
Two nodes (node1, node2) are monitored for disk usage. Alerts are grouped by
groupId, causing repeated alerts to follow
group_wait,
group_interval, and
repeat_intervalsemantics.
for : duration a metric must be abnormal before the alert fires.
group_wait : initial wait after a new group is created.
group_interval : interval between alerts when the group composition changes.
repeat_interval : interval between identical alerts when the group does not change (including recovery alerts).
Exporter Deployment
Exporters can run as sidecars (1:1 with the target pod) or as independent services (1:1 or 1:many). Sidecars bind the exporter lifecycle to the target, while independent deployments reduce coupling and are more flexible for multi‑node services such as Kafka.
Additional Tools
Use
promtoolto validate metric formats (e.g., ensure metric names and label names contain no dots). Port‑forwarding can expose Prometheus, Grafana, and Alertmanager for external access:
<code># Prometheus UI
nohup kubectl port-forward --address 0.0.0.0 service/prometheus-k8s 19090:9090 -n monitoring &
# Grafana UI
nohup kubectl port-forward --address 0.0.0.0 service/grafana 13000:3000 -n monitoring &
# Alertmanager UI
nohup kubectl port-forward --address 0.0.0.0 service/alertmanager-main 9093:9093 -n monitoring &
</code>Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.