Cloud Native 18 min read

Mastering Kubernetes Event Monitoring: Alerts, Collection, and Analysis

This guide explains how to monitor Kubernetes events, differentiate normal and warning events, and use tools like kube-eventer and kube-event-exporter to collect, alert on, and analyze cluster events through webhook, Kafka, Logstash, and Elasticsearch, enabling comprehensive observability and troubleshooting.

Ops Development Stories

Apr 8, 2024

Kubernetes Event Monitoring

With the rise of microservices and cloud‑native architectures, many enterprises run workloads on Kubernetes to leverage its scalability, automation, and stability.

Kubernetes itself is a complex management system, and both the platform and the applications it hosts are critical infrastructure, so observability is essential. Common monitoring methods include:

Using cAdvisor for container resource metrics such as CPU and memory.

Using kube‑state‑metrics for resource object status metrics like Deployment and Pod states.

Using metrics‑server for cluster‑wide resource data.

Using node‑exporter and other exporters for specific component metrics.

Most monitoring focuses on concrete resources (Pod, Node, etc.), but some scenarios cannot be expressed as resources—e.g., pod scheduling or restarts. These are represented in Kubernetes as events.

Kubernetes defines two event types:

Warning events, which occur when the system transitions to an unexpected state.

Normal events, which indicate expected state transitions.

For example, when a Pod is created it moves through Pending, Creating, NotReady, and Running states, generating Normal events. If the Pod encounters an abnormal condition such as node eviction or OOM, Warning events are emitted. Monitoring these events helps detect issues that are otherwise hard to notice.

All events are recorded by the Kubernetes event system, stored in the API server, and persisted in etcd (default retention 1 hour). They can be inspected via the API or kubectl:

Or view events for a specific object:

Events contain timestamp, type, involved object, reason, and message, providing a full lifecycle view of deployments, scheduling, running, and termination. Component source code defines possible event reasons, for example:

package events

// Container event reason list
const (
    CreatedContainer        = "Created"
    StartedContainer        = "Started"
    FailedToCreateContainer = "Failed"
    FailedToStartContainer  = "Failed"
    KillingContainer        = "Killing"
    PreemptContainer        = "Preempting"
    BackOffStartContainer   = "BackOff"
    ExceededGracePeriod    = "ExceededGracePeriod"
)

// Pod event reason list
const (
    FailedToKillPod                = "FailedKillPod"
    FailedToCreatePodContainer     = "FailedCreatePodContainer"
    FailedToMakePodDataDirectories = "Failed"
    NetworkNotReady                = "NetworkNotReady"
)

Because etcd does not support complex queries, events are typically inspected manually. In practice, teams often need additional capabilities:

Alerting on abnormal events.

Querying historical events beyond the default retention.

Performing flexible statistical analysis on cluster events.

To meet these needs, we can collect Kubernetes events with dedicated tools. Two popular options are:

kube‑eventer (an Alibaba Cloud event collector).

kube‑event‑exporter (a community‑maintained exporter).

Below we demonstrate using both: kube‑eventer for alerting via webhook, and kube‑event‑exporter to forward events to Elasticsearch for storage and analysis.

Using kube-eventer for alerting

kube-eventer can send alerts to enterprise WeChat, DingTalk, or generic webhook endpoints. The example below configures a webhook to push Warning‑level events to WeChat.

# cat kube-eventer.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    name: kube-eventer
  name: kube-eventer
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kube-eventer
  template:
    metadata:
      labels:
        app: kube-eventer
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ""
    spec:
      dnsPolicy: ClusterFirstWithHostNet
      serviceAccount: kube-eventer
      containers:
      - image: registry.aliyuncs.com/acs/kube-eventer:v1.2.7-ca03be0-aliyun
        name: kube-eventer
        command:
        - "/kube-eventer"
        - "--source=kubernetes:https://kubernetes.default.svc.cluster.local"
        - "--sink=webhook:http://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=888888-888-8888-8888-d35c52ff2e0b&level=Warning&header=Content-Type=application/json&custom_body_configmap=custom-webhook-body&custom_body_configmap_namespace=monitoring&method=POST"
        env:
        - name: TZ
          value: "Asia/Shanghai"
        volumeMounts:
        - name: localtime
          mountPath: /etc/localtime
          readOnly: true
        - name: zoneinfo
          mountPath: /usr/share/zoneinfo
          readOnly: true
        resources:
          requests:
            cpu: 100m
            memory: 100Mi
          limits:
            cpu: 500m
            memory: 250Mi
      volumes:
      - name: localtime
        hostPath:
          path: /etc/localtime
      - name: zoneinfo
        hostPath:
          path: /usr/share/zoneinfo
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: kube-eventer
rules:
- apiGroups: [""]
  resources: [events, configmaps]
  verbs: [get, list, watch]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: kube-eventer
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: kube-eventer
subjects:
- kind: ServiceAccount
  name: kube-eventer
  namespace: monitoring
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: kube-eventer
  namespace: monitoring
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: custom-webhook-body
  namespace: monitoring
data:
  content: >-
    {"msgtype": "text","text": {"content": "Cluster event alert
Event level: {{ .Type }}
Namespace: {{ .InvolvedObject.Namespace }}
Object kind: {{ .InvolvedObject.Kind }}
Object name: {{ .InvolvedObject.Name }}
Reason: {{ .Reason }}
Timestamp: {{ .LastTimestamp }}
Message: {{ .Message }}"}}

The webhook configuration adds level=Warning so only Warning events trigger alerts. Additional filters such as namespaces, kinds, or reason can be applied to narrow the scope.

Using kube-event-exporter to collect cluster events

While kube-eventer provides alerting, it does not store historical events. To retain and analyze events, we use kube-event-exporter to forward them to Elasticsearch (via Kafka).

apiVersion: v1
kind: ServiceAccount
metadata:
  namespace: monitoring
  name: event-exporter
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: event-exporter
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: view
subjects:
- kind: ServiceAccount
  namespace: monitoring
  name: event-exporter
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: event-exporter-cfg
  namespace: monitoring
data:
  config.yaml: |
    logLevel: error
    logFormat: json
    route:
      routes:
      - match:
        - receiver: "kafka"
        drop:
        - kind: "Service"
    receivers:
    - name: "kafka"
      kafka:
        clientId: "kubernetes"
        topic: "kubenetes-event"
        brokers:
        - "192.168.100.50:9092"
        - "192.168.100.51:9092"
        - "192.168.100.52:9092"
        compressionCodec: "snappy"
        layout: #optional
          kind: "{{ .InvolvedObject.Kind }}"
          namespace: "{{ .InvolvedObject.Namespace }}"
          name: "{{ .InvolvedObject.Name }}"
          reason: "{{ .Reason }}"
          message: "{{ .Message }}"
          type: "{{ .Type }}"
          timestamp: "{{ .GetTimestampISO8601 }}"
          cluster: "sda-pre-center"
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: event-exporter
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app: event-exporter
      version: v1
  template:
    metadata:
      labels:
        app: event-exporter
        version: v1
    spec:
      serviceAccountName: event-exporter
      containers:
      - name: event-exporter
        image: ghcr.io/resmoio/kubernetes-event-exporter:latest
        args:
        - -conf=/data/config.yaml
        volumeMounts:
        - name: cfg
          mountPath: /data
      volumes:
      - name: cfg
        configMap:
          name: event-exporter-cfg

After deploying kube-event-exporter, events appear in Kafka:

Deploy Logstash to store events in Elasticsearch

kind: Deployment
apiVersion: apps/v1
metadata:
  name: kube-event-logstash
  namespace: log
  labels:
    app: kube-event-logstash
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kube-event-logstash
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: kube-event-logstash
      annotations:
        kubesphere.io/restartedAt: '2024-02-22T09:03:36.215Z'
    spec:
      volumes:
      - name: kube-event-logstash-pipeline-config
        configMap:
          name: kube-event-logstash-pipeline-config
          defaultMode: 420
      containers:
      - name: kube-event-logstash
        image: 'logstash:7.8.0'
        env:
        - name: XPACK_MONITORING_ELASTICSEARCH_HOSTS
          value: 'http://192.168.100.100:8200'
        - name: XPACK_MONITORING_ELASTICSEARCH_USERNAME
          value: jokerbai
        - name: XPACK_MONITORING_ELASTICSEARCH_PASSWORD
          value: JeA9BiAgnNRzVrp5JRVQ4vYX
        - name: PIPELINE_ID
          value: kube-event-logstash
        - name: KAFKA_SERVER
          value: '192.168.100.50:9092,192.168.100.51:9092,192.168.100.52:9092'
        - name: ES_SERVER
          value: 'http://192.168.100.100:8200'
        - name: ES_USER_NAME
          value: jokerbai
        - name: ES_USER_PASSWORD
          value: JeA9BiAgnNRzVrp5JRVQ4vYX
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.name
        - name: PIPELINE_BATCH_SIZE
          value: '4000'
        - name: PIPELINE_BATCH_DELAY
          value: '100'
        - name: PIPELINE_WORKERS
          value: '4'
        - name: LS_JAVA_OPTS
          value: '-Xms2g -Xmx3500m'
        resources:
          limits:
            cpu: '2'
            memory: 4Gi
          requests:
            cpu: '2'
            memory: 4Gi
        volumeMounts:
        - name: kube-event-logstash-pipeline-config
          mountPath: /usr/share/logstash/pipeline
        livenessProbe:
          tcpSocket:
            port: 9600
          initialDelaySeconds: 39
          timeoutSeconds: 5
          periodSeconds: 30
          successThreshold: 1
          failureThreshold: 2
        readinessProbe:
          tcpSocket:
            port: 9600
          initialDelaySeconds: 39
          timeoutSeconds: 5
          periodSeconds: 30
          successThreshold: 1
          failureThreshold: 2
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        imagePullPolicy: IfNotPresent
      restartPolicy: Always
      terminationGracePeriodSeconds: 30
      dnsPolicy: ClusterFirst
      securityContext: {}
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: node/category
                operator: In
                values:
                - app
      schedulerName: default-scheduler
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 25%
      maxSurge: 25%
  revisionHistoryLimit: 10
  progressDeadlineSeconds: 600
---
kind: ConfigMap
apiVersion: v1
metadata:
  name: kube-event-logstash-pipeline-config
  namespace: log
data:
  logstash.conf: |-
    input {
      kafka {
        id => "kafka_plugin_id"
        bootstrap_servers => "${KAFKA_SERVER}"
        client_id => "logstash"
        group_id => "logstash"
        decorate_events => true
        topics => ["kubenetes-event"]
        codec => json {
          charset => "UTF-8"
        }
      }
    }
    output {
      elasticsearch {
        hosts => "${ES_SERVER}"
        user => "${ES_USER_NAME}"
        password => "${ES_USER_PASSWORD}"
        index => "kubernetes-event-%{+YYYY.MM}"
        manage_template => false
        template_name => "kubernetes-event"
      }
    }

Before deploying Logstash, create an Elasticsearch index template (e.g., via Kibana Dev Tools):

PUT _template/kubernetes-event
{
  "index_patterns": ["*kubernetes-event*"],
  "settings": {
    "index": {
      "highlight": {"max_analyzed_offset": "10000000"},
      "number_of_shards": "2",
      "number_of_replicas": "0"
    }
  },
  "mappings": {
    "properties": {
      "cluster": {"type": "keyword"},
      "kind": {"type": "keyword"},
      "message": {"type": "text"},
      "name": {"type": "keyword"},
      "namespace": {"type": "keyword"},
      "reason": {"type": "keyword"},
      "type": {"type": "keyword"},
      "timestamp": {"type": "keyword"}
    }
  },
  "aliases": {}
}

After deploying Logstash, events become searchable in Kibana, where you can build visualizations such as counting occurrences of a specific reason (e.g., Unhealthy) for the current day.

By collecting and analyzing Kubernetes events, teams gain deeper insight into cluster health, can set precise alerts, and perform historical analysis to improve reliability.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

cloud-native Elasticsearch Kubernetes alerting Logstash Event Monitoring

Written by

Ops Development Stories

Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.