Operations 14 min read

How to Safely Backup and Restore etcd in a Kubernetes Cluster

This guide explains why etcd backup is critical for Kubernetes disaster recovery, walks through snapshot creation, distribution, scheduled cron jobs, and provides a step‑by‑step procedure to restore the cluster on all nodes, ensuring services resume correctly.

Efficient Ops
Efficient Ops
Efficient Ops
How to Safely Backup and Restore etcd in a Kubernetes Cluster

1. etcd Cluster Backup

etcd stores all Kubernetes cluster state; backing up its data is essential for disaster recovery.

Key points:

Backup can be performed on any node of the etcd cluster.

Use the v3 API (ETCDCTL_API=3) because Kubernetes 1.13+ no longer supports v2.

Example environment uses binary‑deployed k8s v1.18.6 with Calico.

1) View etcd data directories

<code>etcd data directory:
export ETCD_DATA_DIR="/data/k8s/etcd/data"
export ETCD_WAL_DIR="/data/k8s/etcd/wal"
</code>

2) Create backup directory and take snapshot

<code># mkdir -p /data/etcd_backup_dir
ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/cert/ca.pem \
  --cert=/etc/etcd/cert/etcd.pem \
  --key=/etc/etcd/cert/etcd-key.pem \
  --endpoints=https://172.16.60.231:2379 \
  snapshot save /data/etcd_backup_dir/etcd-snapshot-`date +%Y%m%d`.db
</code>

Copy the snapshot to the other etcd nodes:

<code>rsync -e "ssh -p22" -avpgolr /data/etcd_backup_dir/etcd-snapshot-20200820.db root@k8s-master02:/data/etcd_backup_dir/
rsync -e "ssh -p22" -avpgolr /data/etcd_backup_dir/etcd-snapshot-20200820.db root@k8s-master03:/data/etcd_backup_dir/
</code>
etcd backup output
etcd backup output

Schedule daily backups with cron:

<code># chmod 755 /data/etcd_backup_dir/etcd_backup.sh
# crontab -l
0 5 * * * /bin/bash -x /data/etcd_backup_dir/etcd_backup.sh > /dev/null 2>&1
</code>

2. etcd Cluster Restore

Restoration must be performed on every etcd node.

Simulate data loss:

<code># rm -rf /data/k8s/etcd/data/*
</code>

Stop services on all masters and etcd nodes:

<code># systemctl stop kube-apiserver
# systemctl stop etcd
</code>

Delete old data and WAL directories on each node:

<code># rm -rf /data/k8s/etcd/data && rm -rf /data/k8s/etcd/wal
</code>

Restore snapshot on each node (example for 172.16.60.231):

<code>ETCDCTL_API=3 etcdctl \
  --name=k8s-etcd01 \
  --endpoints="https://172.16.60.231:2379" \
  --cert=/etc/etcd/cert/etcd.pem \
  --key=/etc/etcd/cert/etcd-key.pem \
  --cacert=/etc/kubernetes/cert/ca.pem \
  --initial-cluster-token=etcd-cluster-0 \
  --initial-advertise-peer-urls=https://172.16.60.231:2380 \
  --initial-cluster="k8s-etcd01=https://172.16.60.231:2380,k8s-etcd02=https://172.16.60.232:2380,k8s-etcd03=https://192.168.137.233:2380" \
  --data-dir=/data/k8s/etcd/data \
  --wal-dir=/data/k8s/etcd/wal \
  snapshot restore /data/etcd_backup_dir/etcd-snapshot-20200820.db
</code>

Repeat the command on the other two nodes, adjusting the IP address and node name accordingly.

Start etcd services, verify health, then start kube‑apiserver services and check cluster status.

After restoration, pods gradually return to the Running state, confirming a successful recovery.

3. Summary

Backing up the etcd cluster is the key to protecting a Kubernetes cluster. Restoration requires stopping kube‑apiserver, stopping etcd, restoring data on one node, starting etcd, and finally restarting kube‑apiserver.

Only one etcd node needs to be backed up; the snapshot is synchronized to other nodes.

Restoring from a single node’s snapshot is sufficient.

KubernetesClusterBackupETCDrestorecrontab
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.