Multi-Cluster Kubernetes: Benefits, Federation, Karmada, and Practical Tips
This article explains why organizations adopt multi‑cluster Kubernetes for high availability, hybrid‑cloud scaling, and fault isolation, outlines the preparatory steps, compares Federation v1 and v2, introduces Karmada as a CNCF project, and shares practical non‑federated deployment, monitoring, traffic management, and migration techniques with code examples.
Why Multi-Cluster?
Active‑active deployment: improve availability, avoid single‑cluster failures.
Hybrid‑cloud: use public‑cloud elastic resources for traffic spikes.
Control fault explosion radius.
Preparation for Multi‑Cluster
Lifecycle management (create clusters, add nodes, etc.) – see the "kube on kube" concept.
Application distribution and deployment.
Monitoring and alerting.
North‑South traffic management.
East‑West traffic management.
Application migration.
Multi‑Cluster Exploration (Federation)
Federation v1
Earliest multi‑cluster project proposed and maintained by the Kubernetes community.
<code>apiVersion: extensions/v1beta1
kind: ReplicaSet
metadata:
name: nginx-us
annotations:
federation.kubernetes.io/replica-set-preferences: |
{
"rebalance": true,
"clusters": {
"us-east1-b": {
"minReplicas": 2,
"maxReplicas": 4,
"weight": 1
},
"us-central1-b": {
"minReplicas": 2,
"maxReplicas": 4,
"weight": 1
}
}
}
</code>All federation configuration is placed in annotations; the creation flow mirrors native Kubernetes. Configurations go to the Federated API Server, which creates resources in each member cluster.
Deprecated around Kubernetes v1.11 due to two major issues:
Annotation‑based distribution makes the API bloated and inelegant.
Fixed GVK leads to poor compatibility across cluster versions.
Federation v2
Replaces annotation‑based distribution with CRDs and controllers, avoiding intrusion into the native Kubernetes API.
Consists of two components:
admission‑webhook for admission control.
controller‑manager for handling custom resources and coordinating state across clusters.
Users generate a FederatedTypeConfig CRD via
kubefedctl enableand deploy resources using defined YAML files.
Example FederatedDeployment configuration:
<code>apiVersion: types.kubefed.io/v1beta1
kind: FederatedDeployment
metadata:
name: fed-deploy
namespace: fed-ns
spec:
template:
{deployment-define}
overrides:
- clusterName: cluster-1
clusterOverrides:
- path: /spec/replicas
value: 10
- path: /spec/template/spec/containers/0/image
value: nginx:1.17.0-alpine
placement:
clusters:
- name: cluster-1
- name: cluster-2
</code>Federated Type CRD includes three parts:
Template – the resource definition used to create the object in each cluster.
Placement – distribution strategy defining which clusters receive the resource.
Overrides – per‑cluster field overrides for configuration updates; clusters not listed use the template unchanged.
ReplicaSchedulingPreference (RSP) can rewrite placement and overrides based on a user‑defined scheduling file.
Example RSP file defines total replicas and per‑cluster weights and limits:
<code>apiVersion: scheduling.kubefed.io/v1alpha1
kind: ReplicaSchedulingPreference
metadata:
name: fed-deploy
namespace: fed-ns
spec:
targetKind: FederatedDeployment
totalReplicas: 20
clusters:
"*":
weight: 1
maxReplicas: 15
cluster-1:
weight:
minReplicas: 3
maxReplicas: 10
</code>RSP currently supports only FederatedDeployment and FederatedReplicaSet and has been deprecated.
Karmada
Karmada builds on Federation v1 and v2, inheriting basic concepts, and is a CNCF incubating project.
It provides plug‑and‑play automation for multi‑cloud and hybrid‑cloud scenarios, offering centralized management, high availability, fault recovery, and traffic scheduling.
Resources are submitted to Karmada’s own API server, which uses the standard Kubernetes kube‑apiserver, supporting any resource without the issues of v1.
Demo deployment and propagation policies:
<code>apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
labels:
app: nginx
spec:
replicas: 2
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- image: nginx
name: nginx
---
apiVersion: policy.karmada.io/v1alpha1
kind: PropagationPolicy
metadata:
name: nginx-propagation
spec:
resourceSelectors:
- apiVersion: apps/v1
kind: Deployment
name: nginx
placement:
clusterAffinity:
clusterNames:
- member1
- member2
replicaScheduling:
replicaDivisionPreference: Weighted
replicaSchedulingType: Divided
weightPreference:
staticWeightList:
- targetCluster:
clusterNames:
- member1
weight: 1
- targetCluster:
clusterNames:
- member2
weight: 1
---
apiVersion: policy.karmada.io/v1alpha1
kind: OverridePolicy
metadata:
name: nginx-op
spec:
resourceSelectors:
- apiVersion: apps/v1
kind: Deployment
name: nginx
overrideRules:
- targetCluster:
clusterNames:
- member2
overriders:
labelsOverrider:
- operator: add
value:
env: skoala-dev
- operator: add
value:
env-stat: skoala-stage
- operator: remove
value:
for: for
- operator: replace
value:
bar: test
- targetCluster:
clusterNames:
- member1
overriders:
annotationsOverrider:
- operator: add
value:
env: skoala-stage
- operator: remove
value:
bom: bom
- operator: replace
value:
emma: sophia
</code>Policy Controller watches PropagationPolicy objects, creates ResourceBinding for matching resources.
Binding Controller watches ResourceBinding and creates work objects for each cluster.
Execution Controller watches work objects and creates the actual resources in member clusters.
Other Projects
Additional multi‑cluster projects include cluster‑api for multi‑cloud infrastructure, Clusterpedia for multi‑cluster search, Submariner for pod connectivity, multicluster‑ingress, Service Mesh solutions (Istio, Cilium) for cross‑cluster traffic, and storage projects for cross‑cluster storage management and migration.
Multi‑Cluster Practical Implementation (Non‑Federated)
Describes how applications are deployed across multiple clusters using existing automation platforms, supporting container lifecycle, Ingress lifecycle, scaling, HPA/HPC, rolling and gray releases.
Multi‑cluster monitoring and alerting adjustments, including external Alertmanager configuration and remote write to VictoriaMetrics.
<code># Modify alertmanagers to use external Alertmanager configuration, pre‑create svc and ep
alerting:
alertmanagers:
- apiVersion: v2
name: xxx
namespace: xxx
pathPrefix: /
port: xxx
# Add cluster unique labels
externalLabels:
cluster: xxx
environment: xxx
zone: xxx
# Remote write to VictoriaMetrics, configure deduplication
remoteWrite:
- url: http://xxx/api/v1/write
writeRelabelConfigs:
- action: labeldrop
regex: prometheus_replica
</code>North‑South traffic management via an outer Nginx reverse proxy that balances Ingress nodes based on pod and VM counts. Weight calculation: each pod = 1 kvm weight 10; each Ingress node weight = (pod count × 10 / number of Ingress nodes), rounded, minimum 1.
East‑West traffic management uses Calico BGP for flat networking, enabling direct pod‑to‑pod communication across clusters and VMs. Service discovery is handled by Nacos instead of CoreDNS.
Multi‑cluster application migration workflow assumes all clusters are backed up with Velero to a shared S3 bucket.
Precondition: all clusters backed up via Velero to a unified S3 bucket. If cluster A fails and cluster B lacks sufficient resources, consider downgrade strategies such as halving replica counts for non‑critical services.
Sample Go migration code demonstrates handling backup download, replica calculation, resource checks, and concurrent rollout, service, and ingress migration, including cleanup of source resources.
<code>// Handlek8sMigrate processes K8s migration
func Handlek8sMigrate(r migration.K8sMigrationRequest, k *K8sController, l *store.LogTask) {
l.WriteNewLog("From %s cluster %s env %s migrate AppID %s to %s cluster", r.MigrationData.SrcCluster, r.MigrationData.Env, r.MigrationData.MigrateType, strings.Join(r.MigrationData.Appid, ", "), r.MigrationData.DestCluster)
// Offline migration: download and extract Velero backup
if r.MigrationData.MigrateType == "offline" {
var cluster string
if r.MigrationData.Env == "xxx" || r.MigrationData.Env == "xx" {
cluster = "xxxx"
}
commandStr := fmt.Sprintf("mkdir -p /tmp/backupdownload/ && cd /tmp/backupdownload/ && velero backup download $(velero backup get | grep \"%s-k8s-%s\" | head -1 | awk '{print $1}') && tar -xvf *.tar.gz", cluster, r.MigrationData.Env)
_, err := executeSSHCommand(commandStr)
if err != nil {
fmt.Println(err)
}
}
// Get k8s clients for source and destination clusters
srcK8sClient := k.store.GetK8sClient(r.MigrationData.SrcCluster, r.MigrationData.Env)
destK8sClient := k.store.GetK8sClient(r.MigrationData.DestCluster, r.MigrationData.Env)
// Calculate total pods to migrate
var podNumSum int32
for _, appid := range r.MigrationData.Appid {
replicas, _, err := getReplicasAndAppid(srcK8sClient, appid, r.MigrationData.Env, r.MigrationData.MigrateType)
if err != nil {
l.WriteNewLog(appid + " get replica count error " + err.Error())
}
podNumSum += replicas
}
l.WriteNewLog("Total pods to migrate: %d", podNumSum)
// Check destination cluster capacity
destRemainingData, err := getRemainingData(r)
if err != nil {
l.WriteFineshedLog("failed", "cannot get destination pod capacity"+err.Error())
return
}
l.WriteNewLog("Destination cluster can create %d pods", destRemainingData)
if destRemainingData > int(podNumSum) {
l.WriteNewLog("Destination resources sufficient, proceeding with migration")
} else {
l.WriteFineshedLog("failed", "Destination resources insufficient, migration aborted")
return
}
// Concurrent migration of rollout, service, ingress
var wg sync.WaitGroup
wg.Add(len(r.MigrationData.Appid))
for _, appid := range r.MigrationData.Appid {
go func(appid string) {
defer wg.Done()
l.WriteNewLog(appid + " migrating...")
msg, err := rolloutMigrate(appid, r.MigrationData.Env, srcK8sClient, destK8sClient, r.MigrationData.MigrateType)
if err != nil {
l.WriteNewLog(appid + " rollout creation failed")
return
}
l.WriteNewLog(appid + " " + msg)
// Update metadata, migrate service, ingress, cleanup source resources, etc.
// (omitted for brevity)
l.WriteNewLog(appid + " migration succeeded")
}(appid)
}
wg.Wait()
// Cleanup for offline migration
if r.MigrationData.MigrateType == "offline" {
commandStr := "rm -rf /tmp/backupdownload/*"
_, err := executeSSHCommand(commandStr)
if err != nil {
fmt.Println(err)
}
}
l.WriteFineshedLog("success", "Migration completed")
}
</code>Conclusion: Choose the solution that best fits your company’s actual situation.
References:
Kubernetes Multi‑Cluster Management Architecture Exploration – Xu Xin Zhao, KubeSphere Maintainer
Kubernetes Multi‑Cluster Architecture Thoughts, Practices, and Exploration – Duan Meng, Mobile Cloud
Hybrid‑Cloud Kubernetes Multi‑Cluster Management and Application Deployment – Li Yu
Intelligent Cloud – Kubernetes Multi‑Cluster Management Solution kubefed Analysis
K8s Multi‑Cluster Practice Thoughts and Exploration – Vivo Internet Technology
Karmada Official Documentation
Ops Development Stories
Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.