Cloud Native 13 min read

Implementation and Practice of Karmada-Operator at vivo: Architecture, API Design, and CI/CD

vivo created an Ansible‑based Karmada‑Operator that declaratively manages multi‑cluster deployments, etcd backup/restore, and control‑plane upgrades via custom CRDs and CI pipelines, addressing the limitations of existing tools and providing extensible, reliable, self‑healing orchestration for large‑scale Kubernetes environments.

vivo Internet Technology
vivo Internet Technology
vivo Internet Technology
Implementation and Practice of Karmada-Operator at vivo: Architecture, API Design, and CI/CD

Background

vivo's Internet server team migrated many services to Kubernetes, leading to rapid growth in cluster scale and number, which increased operational difficulty. After evaluating community projects, they selected Karmada, an open‑source cloud‑native multi‑cloud container orchestration project, for its unified multi‑cluster management, cross‑cluster elasticity, native Kubernetes API usage, disaster recovery capabilities, and extensibility.

Challenges with Existing Tools

The community offers several deployment tools (karmadactl, Karmada charts, binary deployment, hack scripts), but they have drawbacks such as multiple choices, script defects, lack of UI, missing CI testing, insufficient etcd HA features, and complex dependency installation.

Goal

The article shares vivo's practice of building a Karmada‑Operator to address these issues, covering solution selection, API design, architecture, and CI pipeline.

Operator SDK Overview

The Operator Framework provides a way to manage Kubernetes native applications automatically. Operator SDK simplifies development by offering high‑level APIs, scaffolding, code generation, and extensions for common use cases.

Solution Options

Option 1: Go‑based Operator – suitable for Kubernetes‑native stateful services but limited for binary deployments and external etcd.

Option 2: Ansible‑based Operator – supports both Kubernetes‑based and non‑Kubernetes binary deployments, leveraging Ansible’s SSH and K8s modules.

Option 3: Hybrid Go + Ansible – combines capabilities of both.

After evaluation, vivo chose the Ansible‑based Operator (Option 2) because it provides comparable capabilities to the Go SDK, matches production requirements, is easy to learn, and offers strong extensibility.

API Design

The Operator defines CRDs such as KarmadaDeployment , EtcdBackup , and EtcdRestore . The watches.yaml implements the Reconcile logic. These resources allow declarative specification of Karmada deployment, etcd backup, and restore operations.

Architecture

The design supports both containerized and binary deployments. Containerized deployment uses only Kubernetes APIs, while binary deployment relies on SSH to manage the control plane. Member clusters are registered/unregistered via generated Ansible inventory files.

Control‑Plane Management

Standardized certificate management using OpenSSL.

External load‑balancer support for the Karmada API server.

Flexible upgrade strategies (component‑wise or full‑cluster).

Rich global variable definitions for future configuration changes.

etcd Cluster Management

Custom Ansible plugins provide member addition/removal, backup (e.g., to CephFS), restoration, and health checks. Separate CRDs for EtcdBackup and EtcdRestore isolate etcd operations from the main Karmada deployment.

Member Cluster Management

Dynamic inventory plugins generate Ansible inventories from the KarmadaDeployment spec, enabling concurrent registration and deregistration of member clusters via add-member and del-member roles.

CI Pipeline

The CI workflow runs on a self‑hosted GitHub Runner with KubeVirt. The pipeline includes syntax checks (ansible‑lint, shellcheck, yamllint, etc.), cluster deployment tests (various Karmada install methods, join/unjoin, upgrades, etcd backup/restore), functional tests (Karmada e2e, Bookinfo demo), and performance tests (simulating 2000‑node member clusters, measuring failover time for 40k pods).

Summary

The Karmada‑Operator built by vivo demonstrates high extensibility, reliability, and ease of writing operational logic. It provides declarative, self‑healing management for multi‑cluster environments, though it currently lacks webhook support and a sophisticated CRD scaffolding tool. The project is open‑source and invites community contributions.

cloud-nativeCI/CDKubernetesOperatormulti-clusterKarmadaAnsible
vivo Internet Technology
Written by

vivo Internet Technology

Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.