Mastering PrometheusRule: Streamline Kubernetes Alerting & Recording
This article explains how PrometheusRule, a Kubernetes custom resource, simplifies the management of alerting and recording rules by centralizing configurations, reducing restarts, avoiding conflicts, and enabling version‑controlled, modular monitoring for cloud‑native environments.
In operations, monitoring systems are the eyes of the infrastructure, and Prometheus has become the de‑facto tool in the cloud‑native era, yet many still edit only
prometheus.yml, facing restarts, conflicts, and scattered rule files.
What is PrometheusRule?
PrometheusRule is a Custom Resource Definition (CRD) provided by the Prometheus Operator, designed to manage alerting and recording rules in a structured, modular way within Kubernetes.
Recording Rules
Recording rules allow pre‑computing frequently used queries and storing them as new time series.
<code>apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: kube-node.rules
namespace: obs-system
labels:
release: monitor
spec:
groups:
- name: kube-node.rules
rules:
- expr: label_replace(node_load5, "internal_ip", "$1", "instance", "(.+):.*") * on(internal_ip) group_left(node) kube_node_info
record: node:kube_node_load5
- expr: sum(kube_pod_container_resource_requests{resource="memory"}) by (node)
record: node:resource_memory_requests_byte:sum</code>Tip: 1.
metadata.labelsmust include all label pairs required by
prometheus.spec.ruleSelector, otherwise the rule won’t be applied. 2. The
exprfield contains the PromQL expression. 3. The
recordfield defines the name of the resulting metric.
Alerting Rules
Alerting rules define conditions that trigger alerts when expressions evaluate to true.
<code>apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: kube-node
namespace: obs-system
labels:
release: monitor
spec:
groups:
- name: kube-node
rules:
- alert: NodeLoad5Overload
annotations:
description: The load of {{ $labels.node }} nodes in a Kubernetes cluster exceeds the number of CPUs in 5 minutes. The current value is {{ printf "%.2f" $value }} * total number of CPUs
summary: The load on the node is too high for 5 minutes
expr: sum by (node) (node:kube_node_load5) / sum by(node) (kube_node_status_allocatable{resource="cpu"}) > 1
for: 1h
labels:
severity: warning
- alert: NodeRequestMemoryOverload
annotations:
description: The {{ $labels.node }} node in the Kubernetes cluster requests too much memory resources. The current node request memory rate is {{ printf "%.2f" $value }}%
summary: The request memory of the node resource is too high
expr: node:resource_memory_requests_byte:sum / sum by (node) (kube_node_status_allocatable{resource="memory"}) > 0.8
for: 1h
labels:
severity: warning</code>Tip: 1.
metadata.labelsmust match the selector used by Prometheus. 2.
alertis the alert name;
annotationsprovide description and summary. 3.
exprholds the PromQL condition;
forspecifies how long the condition must hold before firing. 4.
labelstag the alert with severity and grouping information.
Conclusion
PrometheusRule is more than just moving rule definitions; it represents a modern, engineering‑focused way to configure monitoring, eliminating manual file edits and making the monitoring system truly dynamic and a powerful ally for operations.
Linux Ops Smart Journey
The operations journey never stops—pursuing excellence endlessly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.