Mastering Kubernetes Event Monitoring: Alerts, Collection, and Analysis
This guide explains how to monitor Kubernetes events, differentiate normal and warning events, and use tools like kube-eventer and kube-event-exporter to collect, alert on, and analyze cluster events through webhook, Kafka, Logstash, and Elasticsearch, enabling comprehensive observability and troubleshooting.
Kubernetes Event Monitoring
With the rise of microservices and cloud‑native architectures, many enterprises run workloads on Kubernetes to leverage its scalability, automation, and stability.
Kubernetes itself is a complex management system, and both the platform and the applications it hosts are critical infrastructure, so observability is essential. Common monitoring methods include:
Using cAdvisor for container resource metrics such as CPU and memory.
Using kube‑state‑metrics for resource object status metrics like Deployment and Pod states.
Using metrics‑server for cluster‑wide resource data.
Using node‑exporter and other exporters for specific component metrics.
Most monitoring focuses on concrete resources (Pod, Node, etc.), but some scenarios cannot be expressed as resources—e.g., pod scheduling or restarts. These are represented in Kubernetes as
events.
Kubernetes defines two event types:
Warning events, which occur when the system transitions to an unexpected state.
Normal events, which indicate expected state transitions.
For example, when a Pod is created it moves through Pending, Creating, NotReady, and Running states, generating Normal events. If the Pod encounters an abnormal condition such as node eviction or OOM, Warning events are emitted. Monitoring these events helps detect issues that are otherwise hard to notice.
All events are recorded by the Kubernetes event system, stored in the API server, and persisted in etcd (default retention 1 hour). They can be inspected via the API or
kubectl:
Or view events for a specific object:
Events contain timestamp, type, involved object, reason, and message, providing a full lifecycle view of deployments, scheduling, running, and termination. Component source code defines possible event reasons, for example:
package events
// Container event reason list
const (
CreatedContainer = "Created"
StartedContainer = "Started"
FailedToCreateContainer = "Failed"
FailedToStartContainer = "Failed"
KillingContainer = "Killing"
PreemptContainer = "Preempting"
BackOffStartContainer = "BackOff"
ExceededGracePeriod = "ExceededGracePeriod"
)
// Pod event reason list
const (
FailedToKillPod = "FailedKillPod"
FailedToCreatePodContainer = "FailedCreatePodContainer"
FailedToMakePodDataDirectories = "Failed"
NetworkNotReady = "NetworkNotReady"
)Because etcd does not support complex queries, events are typically inspected manually. In practice, teams often need additional capabilities:
Alerting on abnormal events.
Querying historical events beyond the default retention.
Performing flexible statistical analysis on cluster events.
To meet these needs, we can collect Kubernetes events with dedicated tools. Two popular options are:
kube‑eventer (an Alibaba Cloud event collector).
kube‑event‑exporter (a community‑maintained exporter).
Below we demonstrate using both: kube‑eventer for alerting via webhook, and kube‑event‑exporter to forward events to Elasticsearch for storage and analysis.
Using kube-eventer for alerting
kube-eventer can send alerts to enterprise WeChat, DingTalk, or generic webhook endpoints. The example below configures a webhook to push Warning‑level events to WeChat.
# cat kube-eventer.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
name: kube-eventer
name: kube-eventer
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: kube-eventer
template:
metadata:
labels:
app: kube-eventer
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ""
spec:
dnsPolicy: ClusterFirstWithHostNet
serviceAccount: kube-eventer
containers:
- image: registry.aliyuncs.com/acs/kube-eventer:v1.2.7-ca03be0-aliyun
name: kube-eventer
command:
- "/kube-eventer"
- "--source=kubernetes:https://kubernetes.default.svc.cluster.local"
- "--sink=webhook:http://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=888888-888-8888-8888-d35c52ff2e0b&level=Warning&header=Content-Type=application/json&custom_body_configmap=custom-webhook-body&custom_body_configmap_namespace=monitoring&method=POST"
env:
- name: TZ
value: "Asia/Shanghai"
volumeMounts:
- name: localtime
mountPath: /etc/localtime
readOnly: true
- name: zoneinfo
mountPath: /usr/share/zoneinfo
readOnly: true
resources:
requests:
cpu: 100m
memory: 100Mi
limits:
cpu: 500m
memory: 250Mi
volumes:
- name: localtime
hostPath:
path: /etc/localtime
- name: zoneinfo
hostPath:
path: /usr/share/zoneinfo
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: kube-eventer
rules:
- apiGroups: [""]
resources: [events, configmaps]
verbs: [get, list, watch]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: kube-eventer
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: kube-eventer
subjects:
- kind: ServiceAccount
name: kube-eventer
namespace: monitoring
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: kube-eventer
namespace: monitoring
---
apiVersion: v1
kind: ConfigMap
metadata:
name: custom-webhook-body
namespace: monitoring
data:
content: >-
{"msgtype": "text","text": {"content": "Cluster event alert
Event level: {{ .Type }}
Namespace: {{ .InvolvedObject.Namespace }}
Object kind: {{ .InvolvedObject.Kind }}
Object name: {{ .InvolvedObject.Name }}
Reason: {{ .Reason }}
Timestamp: {{ .LastTimestamp }}
Message: {{ .Message }}"}}The webhook configuration adds
level=Warningso only Warning events trigger alerts. Additional filters such as
namespaces,
kinds, or
reasoncan be applied to narrow the scope.
Using kube-event-exporter to collect cluster events
While kube-eventer provides alerting, it does not store historical events. To retain and analyze events, we use kube-event-exporter to forward them to Elasticsearch (via Kafka).
apiVersion: v1
kind: ServiceAccount
metadata:
namespace: monitoring
name: event-exporter
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: event-exporter
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: view
subjects:
- kind: ServiceAccount
namespace: monitoring
name: event-exporter
---
apiVersion: v1
kind: ConfigMap
metadata:
name: event-exporter-cfg
namespace: monitoring
data:
config.yaml: |
logLevel: error
logFormat: json
route:
routes:
- match:
- receiver: "kafka"
drop:
- kind: "Service"
receivers:
- name: "kafka"
kafka:
clientId: "kubernetes"
topic: "kubenetes-event"
brokers:
- "192.168.100.50:9092"
- "192.168.100.51:9092"
- "192.168.100.52:9092"
compressionCodec: "snappy"
layout: #optional
kind: "{{ .InvolvedObject.Kind }}"
namespace: "{{ .InvolvedObject.Namespace }}"
name: "{{ .InvolvedObject.Name }}"
reason: "{{ .Reason }}"
message: "{{ .Message }}"
type: "{{ .Type }}"
timestamp: "{{ .GetTimestampISO8601 }}"
cluster: "sda-pre-center"
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: event-exporter
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: event-exporter
version: v1
template:
metadata:
labels:
app: event-exporter
version: v1
spec:
serviceAccountName: event-exporter
containers:
- name: event-exporter
image: ghcr.io/resmoio/kubernetes-event-exporter:latest
args:
- -conf=/data/config.yaml
volumeMounts:
- name: cfg
mountPath: /data
volumes:
- name: cfg
configMap:
name: event-exporter-cfgAfter deploying kube-event-exporter, events appear in Kafka:
Deploy Logstash to store events in Elasticsearch
kind: Deployment
apiVersion: apps/v1
metadata:
name: kube-event-logstash
namespace: log
labels:
app: kube-event-logstash
spec:
replicas: 1
selector:
matchLabels:
app: kube-event-logstash
template:
metadata:
creationTimestamp: null
labels:
app: kube-event-logstash
annotations:
kubesphere.io/restartedAt: '2024-02-22T09:03:36.215Z'
spec:
volumes:
- name: kube-event-logstash-pipeline-config
configMap:
name: kube-event-logstash-pipeline-config
defaultMode: 420
containers:
- name: kube-event-logstash
image: 'logstash:7.8.0'
env:
- name: XPACK_MONITORING_ELASTICSEARCH_HOSTS
value: 'http://192.168.100.100:8200'
- name: XPACK_MONITORING_ELASTICSEARCH_USERNAME
value: jokerbai
- name: XPACK_MONITORING_ELASTICSEARCH_PASSWORD
value: JeA9BiAgnNRzVrp5JRVQ4vYX
- name: PIPELINE_ID
value: kube-event-logstash
- name: KAFKA_SERVER
value: '192.168.100.50:9092,192.168.100.51:9092,192.168.100.52:9092'
- name: ES_SERVER
value: 'http://192.168.100.100:8200'
- name: ES_USER_NAME
value: jokerbai
- name: ES_USER_PASSWORD
value: JeA9BiAgnNRzVrp5JRVQ4vYX
- name: NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: PIPELINE_BATCH_SIZE
value: '4000'
- name: PIPELINE_BATCH_DELAY
value: '100'
- name: PIPELINE_WORKERS
value: '4'
- name: LS_JAVA_OPTS
value: '-Xms2g -Xmx3500m'
resources:
limits:
cpu: '2'
memory: 4Gi
requests:
cpu: '2'
memory: 4Gi
volumeMounts:
- name: kube-event-logstash-pipeline-config
mountPath: /usr/share/logstash/pipeline
livenessProbe:
tcpSocket:
port: 9600
initialDelaySeconds: 39
timeoutSeconds: 5
periodSeconds: 30
successThreshold: 1
failureThreshold: 2
readinessProbe:
tcpSocket:
port: 9600
initialDelaySeconds: 39
timeoutSeconds: 5
periodSeconds: 30
successThreshold: 1
failureThreshold: 2
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: IfNotPresent
restartPolicy: Always
terminationGracePeriodSeconds: 30
dnsPolicy: ClusterFirst
securityContext: {}
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node/category
operator: In
values:
- app
schedulerName: default-scheduler
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 25%
maxSurge: 25%
revisionHistoryLimit: 10
progressDeadlineSeconds: 600
---
kind: ConfigMap
apiVersion: v1
metadata:
name: kube-event-logstash-pipeline-config
namespace: log
data:
logstash.conf: |-
input {
kafka {
id => "kafka_plugin_id"
bootstrap_servers => "${KAFKA_SERVER}"
client_id => "logstash"
group_id => "logstash"
decorate_events => true
topics => ["kubenetes-event"]
codec => json {
charset => "UTF-8"
}
}
}
output {
elasticsearch {
hosts => "${ES_SERVER}"
user => "${ES_USER_NAME}"
password => "${ES_USER_PASSWORD}"
index => "kubernetes-event-%{+YYYY.MM}"
manage_template => false
template_name => "kubernetes-event"
}
}Before deploying Logstash, create an Elasticsearch index template (e.g., via Kibana Dev Tools):
PUT _template/kubernetes-event
{
"index_patterns": ["*kubernetes-event*"],
"settings": {
"index": {
"highlight": {"max_analyzed_offset": "10000000"},
"number_of_shards": "2",
"number_of_replicas": "0"
}
},
"mappings": {
"properties": {
"cluster": {"type": "keyword"},
"kind": {"type": "keyword"},
"message": {"type": "text"},
"name": {"type": "keyword"},
"namespace": {"type": "keyword"},
"reason": {"type": "keyword"},
"type": {"type": "keyword"},
"timestamp": {"type": "keyword"}
}
},
"aliases": {}
}After deploying Logstash, events become searchable in Kibana, where you can build visualizations such as counting occurrences of a specific reason (e.g.,
Unhealthy) for the current day.
By collecting and analyzing Kubernetes events, teams gain deeper insight into cluster health, can set precise alerts, and perform historical analysis to improve reliability.
Ops Development Stories
Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.