Cloud Native 15 min read

How to Build a Kube‑on‑Kube Controller for Managing Multiple Kubernetes Clusters

This article explains the concept of kube‑on‑kube—creating a Kubernetes meta‑cluster to manage other clusters via declarative APIs—detailing its architecture, controller design, execution flow, and step‑by‑step code walkthrough including CRD definitions, Docker images, and deployment procedures.

Ops Development Stories
Ops Development Stories
Ops Development Stories
How to Build a Kube‑on‑Kube Controller for Managing Multiple Kubernetes Clusters

Here, “kube on kube” refers to building a Kubernetes meta‑cluster that governs other business Kubernetes clusters, managing cluster creation, node addition and removal through declarative APIs.

Implementation is based on the trimmed source code of kubean , with thanks to DaoCloud for open‑sourcing.

Background

As containerization coverage grows, more workloads migrate to Kubernetes. To support same‑city active‑active, diverse business requirements, and low coupling, multiple Kubernetes clusters must be deployed and maintained, making efficient, reliable data‑center management of many clusters a key challenge.

Previously, cluster deployment and scaling relied on

ansible

tasks, using

inventory

and

vars

to run an

ansible playbook

.

Overall Architecture of Kube on Kube

The

kubeonkube‑controller

runs inside an existing Kubernetes cluster. By applying its standard CRD resources together with built‑in Kubernetes resources, it controls the lifecycle of managed clusters (install, uninstall, upgrade, scale‑up, scale‑down, etc.). It uses

ansible‑playbook

as the underlying engine, simplifying deployment operations and lowering the user learning curve, while also adding operation logging on top of Ansible.

Controller overview:

Cluster Controller: watches Cluster objects, which uniquely identify a cluster and contain node access info, type, deployment parameters, and link to all related ClusterOperation objects.

ClusterOperation Controller: watches ClusterOperation objects. When a ClusterOperation is created, the controller assembles a Job to execute the operation defined in the CRD.

kubeonkube‑controller Execution Flow

Pre‑steps:

Three resources must be created beforehand: a hosts‑conf ConfigMap (host inventory), a vars‑conf ConfigMap (configuration parameters), and an ssh‑auth Secret (SSH private key).

Cluster Controller Execution Flow Analysis:

Cluster admin or platform creates a Cluster CR to define the desired spec. Cluster Controller detects the change and reconciles. Determine whether the Cluster already exists. Check for any redundant ClusterOperation objects that need cleanup. Update Cluster status and record execution details. Set ownerReferences of hosts‑conf / vars‑conf / ssh‑auth to the current Cluster. Continuously watch for new ClusterOperation tasks and record their execution.

ClusterOperation Controller Execution Flow Analysis:

ClusterOperation objects are also referred to as ClusterOps.

Cluster admin or platform creates a ClusterOperation CR to define the operation spec. The ClusterOperation Controller detects the change, assembles a Job Pod, and executes tasks such as cluster creation or node addition. Upon completion, the Job returns a status, and both Cluster and ClusterOperation record the result and timestamps.

Source Code Development Process

Environment

<code>kubebuilder 3.10.0<br/>go 1.20.3<br/></code>

Initialize project

<code>kubebuilder init --domain clay.io --owner Clay --repo kube-on-kube<br/>kubebuilder edit --multigroup=true<br/>kubebuilder create api --group kubeonkube --version v1alpha1 --kind Cluster<br/>Create Resource [y/n]<br/>y<br/>Create Controller [y/n]<br/>y<br/>kubebuilder create api --group kubeonkube --version v1alpha1 --kind ClusterOperation<br/>Create Resource [y/n]<br/>y<br/>Create Controller [y/n]<br/>y<br/></code>

Modify Makefile to set K8s version

ENVTEST_K8S_VERSION = 1.18.10
<code>make manifests<br/>go mod vendor<br/></code>

Define CRD structs and generate clientset, informer, lister.

Add clientset, informer, lister.

<code># 1. Add hack/tools.go to install dependencies<br/>go get k8s.io/[email protected]<br/>go mod vendor<br/>chmod +x vendor/k8s.io/code-generator/generate-groups.sh<br/># 2. Add hack/update-codegen.sh and adjust variables<br/>MODULE=$(go list -m)<br/>API_PKG=api<br/>OUTPUT_PKG=generated<br/>GROUP_VERSION=kubeonkube:v1alpha1<br/># 3. Add hack/verify-codegen.sh<br/># 4. Add +genclient tag, doc.go, register.go<br/>chmod +x ./hack/update-codegen.sh<br/>./hack/update-codegen.sh<br/></code>

Write reconciliation code, then

make install

to install CRDs.

<code># Configure local kubeconfig<br/>make install<br/>kustomize build config/crd | kubectl apply -f -
</code>

Run

make run

for temporary testing.

<code>make run<br/></code>

Package the controller into a Docker image.

<code># Build the manager binary<br/>FROM golang:1.20 as builder<br/>ARG TARGETOS<br/>ARG TARGETARCH<br/>ENV GOPROXY="https://goproxy.cn"
WORKDIR /workspace
COPY go.mod go.mod
COPY go.sum go.sum
RUN go mod download
COPY cmd/main.go cmd/main.go
COPY api/ api/
COPY pkg/ pkg/
COPY generated/ generated/
COPY internal/ internal/
COPY vendor/ vendor/
RUN CGO_ENABLED=0 GOOS=${TARGETOS:-linux} GOARCH=${TARGETARCH} go build -a -o manager cmd/main.go
# Use distroless as minimal base image
FROM gcr.io/distroless/static:nonroot
WORKDIR /
COPY --from=builder /workspace/manager .
USER 65532:65532
ENTRYPOINT ["/manager"]
</code>
<code># Build and push image
docker build -t wangzhichidocker/kubeonkube-controller:v0.1 .
docker push wangzhichidocker/kubeonkube-controller:v0.1
</code>

Package the ansible‑playbook runtime into a Docker image.

<code># syntax=docker/dockerfile:1
FROM ubuntu:22.04@sha256:149d67e29f765f4db62aa52161009e99e389544e25a8f43c8c89d4a445a7ca37
ENV LANG=C.UTF-8 \
    DEBIAN_FRONTEND=noninteractive \
    PYTHONDONTWRITEBYTECODE=1
WORKDIR /kubespray
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
    apt-get update -q \
    && apt-get install -yq --no-install-recommends \
    curl python3 python3-pip python3-dev gcc sshpass vim rsync openssh-client \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/* /var/log/*
RUN --mount=type=bind,source=requirements.txt,target=requirements.txt \
    --mount=type=cache,sharing=locked,id=pipcache,mode=0777,target=/root/.cache/pip \
    pip install --no-compile --no-cache-dir -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ \
    && find /usr -type d -name '*__pycache__' -prune -exec rm -rf {} \;
COPY *.yml ./
COPY *.cfg ./
COPY roles ./roles
</code>
<code># Build and push image
docker build -t wangzhichidocker/kubeonkube:v0.1 .
docker push wangzhichidocker/kubeonkube:v0.1
</code>

Install the controller in another cluster.

<code># Generate CRD
bin/kustomize build config/crd > deploy/crd.yaml
# Generate RBAC (adjust namespace in roles/rolebinding)
bin/kustomize build config/rbac > deploy/rbac.yaml
# Generate Deployment (modify image)
bin/kustomize build config/manager > deploy/deployment.yaml
# Apply resources
kubectl apply -f crd.yaml
kubectl apply -f deployment.yaml
kubectl apply -f rbac.yaml
</code>

Test by applying Cluster and ClusterOperation YAML files.

Prepare SSHAuthSec.yml, HostsConfCM.yml, VarsConfCM.yml.

SSHAuthSec.yml

<code>kubectl -n kubeonkube create secret generic sample-ssh-auth --type='kubernetes.io/ssh-auth' --from-file=ssh-privatekey=/home/clay/.ssh/id_rsa --dry-run=client -o yaml > SSHAuthSec.yml
</code>

HostsConfCM.yml

<code>apiVersion: v1
kind: ConfigMap
metadata:
  name: sample-hosts-conf
  namespace: kubeonkube
data:
  hosts.yml: |
    all:
      hosts:
        master01:
          ip: 10.100.xx.xx
          access_ip: 10.100.xx.xx
          ansible_host: 10.100.xx.xx
          ansible_user: root
        worker01:
          ip: 10.100.xx.xx
          access_ip: 10.100.xx.xx
          ansible_host: 10.100.xx.xx
          ansible_user: root
      children:
        kube_control_plane:
          hosts:
            master01:
        kube_node:
          hosts:
            worker01:
</code>

VarsConfCM.yml – fill variables according to your environment.

Cluster.yml

<code>apiVersion: kubeonkube.clay.io/v1alpha1
kind: Cluster
metadata:
  name: sample
  namespace: kubeonkube
spec:
  hostsConfRef:
    namespace: kubeonkube
    name: sample-hosts-conf
  varsConfRef:
    namespace: kubeonkube
    name: sample-vars-conf
  sshAuthRef:
    namespace: kubeonkube
    name: sample-ssh-auth
</code>

ClusterOperation.yml

<code>apiVersion: kubeonkube.clay.io/v1alpha1
kind: ClusterOperation
metadata:
  name: sample-node-add
  namespace: kubeonkube
spec:
  cluster: sample
  image: wangzhichidocker/kubeonkube:v0.1
  actionType: playbook
  action: scale.yml
</code>

Apply the above YAML files to execute the operation.

For the full source development history, refer to the commit log at https://github.com/clay-wangzhi/kube-on-kube.

References:

kubean: https://github.com/kubean-io/kubean

Vivo large‑scale Kubernetes automation practice: https://mp.weixin.qq.com/s/L9z1xLXUnz52etw2jDkDkw

DockerkubernetesControllercluster managementAnsibleCRD
Ops Development Stories
Written by

Ops Development Stories

Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.