Cloud Native 15 min read

How Vivo Built a Scalable Karmada Operator with Ansible for Multi‑Cluster Management

Vivo’s engineering team shares their practical experience creating a Karmada‑Operator using the Operator SDK and Ansible, detailing background, deployment challenges, design choices, API and architecture, etcd management, member cluster handling, CI pipeline, and performance testing to enable robust multi‑cloud Kubernetes orchestration.

Efficient Ops

Oct 11, 2022

How Vivo Built a Scalable Karmada Operator with Ansible for Multi‑Cluster Management

Background

Karmada is an open‑source cloud‑native multi‑cloud container orchestration project that has attracted many enterprises and is running in production. Multi‑cloud has become a foundational infrastructure for data‑center construction, driving rapid development of multi‑region disaster recovery, large‑scale multi‑cluster management, cross‑cloud elasticity, and migration scenarios.

Vivo migrated its business to Kubernetes, causing rapid growth in cluster size and number, which increased operational difficulty. After building an internal multi‑cluster management solution that still fell short, the team evaluated community projects and chose Karmada.

Unified management of multiple Kubernetes clusters, reducing platform complexity.

Cross‑cluster elastic scaling and scheduling to improve resource utilization and cut costs.

Karmada uses native Kubernetes APIs, lowering migration effort.

Disaster recovery: decoupled control plane and member clusters enable resource reallocation on failures.

Extensibility: custom scheduling plugins and OpenKruise interpreter plugins can be added.

Karmada‑Operator Implementation

2.1 Operator SDK Overview

The Operator Framework provides a toolkit for building Kubernetes native applications (Operators) in an automated, scalable way. Operators simplify management of complex, stateful workloads by leveraging Kubernetes extensibility for provisioning, scaling, backup, and recovery.

Writing Operators can be challenging due to low‑level APIs, boilerplate code, and lack of modularity. The Operator SDK mitigates these challenges by offering high‑level APIs, scaffolding, code generation, and extensions for common use cases.

2.2 Solution Selection

Option 1: Go‑based Operator – suited for stateful services on Kubernetes but limited for binary deployments, external etcd, and member‑cluster registration.

Option 2: Ansible‑based Operator – supports both Kubernetes‑based and binary deployments, external etcd, and member‑cluster lifecycle via SSH and Ansible modules.

Option 3: Hybrid Go + Ansible Operator – combines capabilities of Option 2 with Go‑level flexibility.

After evaluating the three options, Vivo selected the Ansible‑based Operator (Option 2) because it provides feature parity with the Go SDK, matches Karmada’s production requirements, is easy to learn for Ansible users, offers strong extensibility, and avoids the need for extensive Go code.

2.3 API Design

The Operator SDK can generate a CRD named KarmadaDeployment. Additional CRDs EtcdBackup and EtcdRestore are defined for etcd data management. The spec fields are translated into Ansible variables, and the status is populated by the Ansible runner or the k8s_status module.

2.4 Architecture Design

The architecture supports both containerized and binary deployments. Containerized deployment relies solely on Kubernetes APIs, while binary deployment uses SSH to manage the Karmada control plane and member clusters. Member clusters are registered via provided kubeconfig and credentials defined in the CR.

2.5 Control Plane Management

Standardized certificate management using OpenSSL, separating etcd and Karmada certificates.

Karmada‑apiserver can use external load balancers instead of Kubernetes Services.

Flexible upgrade strategies supporting component‑wise and full‑cluster upgrades.

Rich global variable definitions to enable component configuration changes.

2.6 etcd Cluster Management

etcd is the metadata store for Karmada and must be highly available in production. The Operator provides Ansible plugins to manage etcd clusters, including adding/removing members, backup (e.g., to CephFS), recovery, and health checks.

2.7 Member Cluster Management

Member clusters are registered and deregistered through dynamic Ansible inventory generation based on the KarmadaDeployment spec. Two roles, add‑member and del‑member, handle join and unjoin operations, supporting concurrent processing and optional SSH mode.

CI Introduction

To improve developer experience, Vivo built a CI pipeline that runs GitHub self‑hosted runners and KubeVirt VMs. The pipeline executes syntax and unit tests, provisions VMs, deploys one host and two member clusters, installs Karmada, and runs e2e and Bookinfo tests. Planned CI matrix tests include linting (ansible‑lint, shellcheck, yamllint, etc.), full deployment validation (karmadactl, charts, binary), member join/unjoin, Karmada upgrades, etcd backup/restore, and performance testing with 2000‑node simulations.

Conclusion

Through community research and Vivo’s practice, the Karmada‑Operator design was finalized. The Ansible‑based Operator offers high extensibility, reliability, intuitive logic authoring, and out‑of‑the‑box functionality, providing a robust foundation for managing Karmada at scale. Remaining challenges include adding webhook support and richer CRD scaffolding. Ongoing development will continue to enhance features and stability.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Cloud Native Kubernetes Multi-Cluster Karmada ansible Operator SDK

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.