Design and Implementation of a Zookeeper Operator for Kubernetes
This article outlines the design, functional requirements, CRD definition, architecture, deployment, scaling, monitoring, fault‑tolerance, and upgrade strategies of a Zookeeper operator on Kubernetes, including code examples, service configurations, and integration with Prometheus and OAM standards.
Introduction In 2018 at KubeCon, Alibaba’s Chen Jun introduced the concept of a Node Operator, inspiring the development of a Zookeeper Operator to containerize NoSQL components and manage their lifecycle on Kubernetes.
Functional Requirements The operator must provide rapid deployment, secure scaling, automated monitoring, self‑healing, and visual operation capabilities.
CRD Definition The first step is defining a declarative Item spec that includes node resources, monitoring components, replica count, and persistent storage.
Architecture
Deploy : Generates native resources such as StatefulSet, Service, ConfigMap, and PersistentVolume for fast Zookeeper cluster deployment.
Monitor : Creates ServiceMonitor and PrometheusRule resources to register the cluster with Prometheus and set alerting policies.
Scale : Controls scaling and rolling upgrades, ensuring minimal master‑slave switches during restarts.
CRD Example
apiVersion: database.ymm-inc.com/v1beta1
kind: ZooKeeper
metadata:
name: zookeeper-sample
spec:
version: v3.5.6
cluster:
name: test
resources:
requests:
cpu: 1000m
memory: 2Gi
limits:
cpu: 2000m
memory: 2Gi
exporter:
exporter: true
exporterImage: harbor.ymmoa.com/monitoring/zookeeper_exporter
exporterVersion: v3.5.6
nodeCount: 3
storage:
size: 100GiDeployment Details
Labels applied to the StatefulSet and Service for selection and monitoring: labels: app: zookeeper app.kubernetes.io/instance: zookeeper-sample component: zookeeper zookeeper: zookeeper-sample
InitContainer copies the Zookeeper configuration file into the pod’s working directory.
Main Containers include the Zookeeper process, a monitoring sidecar (exporter), and an agent container for health checks.
Environment Variables such as POD_IP , POD_NAME , and ZK_SERVER_HEAP are injected from the pod spec.
Readiness Probe uses the ruok command to verify the node is ready before updating the dynamic configuration file.
Monitoring Integration
ServiceMonitor registers the exporter port http-metrics with Prometheus: apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: labels: app: zookeeper component: zookeeper spec: endpoints: - interval: 30s port: http-metrics
PrometheusRule creates alerting policies, e.g., sending alerts to a DingTalk robot.
Scaling and Upgrade Strategy
Scaling updates spec.cluster.nodeCount in the Zookeeper CR and triggers the operator to add or remove nodes using the Zookeeper reconfiguration API.
Rolling upgrades are performed by updating the StatefulSet with an OnDelete strategy; the operator deletes pods in a controlled order, respecting MaxUnavailable and leader election.
Partitioned rolling updates allow selective pod replacement based on an index, ensuring minimal disruption.
Agent Sidecar API
/status – returns Zookeeper node metrics (sent/received, latency, mode, version, etc.).
/runok – checks if the node is running without errors.
/health – health check for the agent itself.
/get – retrieves the current dynamic configuration.
/add and /del – add or remove cluster members via Zookeeper reconfigure.
OAM Integration The operator aligns with the Open Application Model (OAM) by defining reusable Components (e.g., the Zookeeper workload) and Traits (e.g., scaling and rolling‑update CRDs), enabling platform‑agnostic application description and management.
Conclusion The Zookeeper operator demonstrates a cloud‑native approach to managing stateful services on Kubernetes, providing deployment, scaling, monitoring, fault‑tolerance, and upgrade capabilities, while offering extensibility for future features such as backup, migration, and advanced scheduling.
Manbang Technology Team
Manbang Technology Team
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.