Operations 14 min read

How to Safely Backup and Restore etcd in a Kubernetes Cluster

This guide explains why etcd backup is critical for Kubernetes disaster recovery, walks through snapshot creation, distribution, scheduled cron jobs, and provides a step‑by‑step procedure to restore the cluster on all nodes, ensuring services resume correctly.

Efficient Ops

May 6, 2021

How to Safely Backup and Restore etcd in a Kubernetes Cluster

1. etcd Cluster Backup

etcd stores all Kubernetes cluster state; backing up its data is essential for disaster recovery.

Key points:

Backup can be performed on any node of the etcd cluster.

Use the v3 API (ETCDCTL_API=3) because Kubernetes 1.13+ no longer supports v2.

Example environment uses binary‑deployed k8s v1.18.6 with Calico.

1) View etcd data directories

etcd data directory:
export ETCD_DATA_DIR="/data/k8s/etcd/data"
export ETCD_WAL_DIR="/data/k8s/etcd/wal"

2) Create backup directory and take snapshot

# mkdir -p /data/etcd_backup_dir
ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/cert/ca.pem \
  --cert=/etc/etcd/cert/etcd.pem \
  --key=/etc/etcd/cert/etcd-key.pem \
  --endpoints=https://172.16.60.231:2379 \
  snapshot save /data/etcd_backup_dir/etcd-snapshot-`date +%Y%m%d`.db

Copy the snapshot to the other etcd nodes:

rsync -e "ssh -p22" -avpgolr /data/etcd_backup_dir/etcd-snapshot-20200820.db root@k8s-master02:/data/etcd_backup_dir/
rsync -e "ssh -p22" -avpgolr /data/etcd_backup_dir/etcd-snapshot-20200820.db root@k8s-master03:/data/etcd_backup_dir/

Schedule daily backups with cron:

# chmod 755 /data/etcd_backup_dir/etcd_backup.sh
# crontab -l
0 5 * * * /bin/bash -x /data/etcd_backup_dir/etcd_backup.sh > /dev/null 2>&1

2. etcd Cluster Restore

Restoration must be performed on every etcd node.

Simulate data loss:

# rm -rf /data/k8s/etcd/data/*

Stop services on all masters and etcd nodes:

# systemctl stop kube-apiserver
# systemctl stop etcd

Delete old data and WAL directories on each node:

# rm -rf /data/k8s/etcd/data && rm -rf /data/k8s/etcd/wal

Restore snapshot on each node (example for 172.16.60.231):

ETCDCTL_API=3 etcdctl \
  --name=k8s-etcd01 \
  --endpoints="https://172.16.60.231:2379" \
  --cert=/etc/etcd/cert/etcd.pem \
  --key=/etc/etcd/cert/etcd-key.pem \
  --cacert=/etc/kubernetes/cert/ca.pem \
  --initial-cluster-token=etcd-cluster-0 \
  --initial-advertise-peer-urls=https://172.16.60.231:2380 \
  --initial-cluster="k8s-etcd01=https://172.16.60.231:2380,k8s-etcd02=https://172.16.60.232:2380,k8s-etcd03=https://192.168.137.233:2380" \
  --data-dir=/data/k8s/etcd/data \
  --wal-dir=/data/k8s/etcd/wal \
  snapshot restore /data/etcd_backup_dir/etcd-snapshot-20200820.db

Repeat the command on the other two nodes, adjusting the IP address and node name accordingly.

Start etcd services, verify health, then start kube‑apiserver services and check cluster status.

After restoration, pods gradually return to the Running state, confirming a successful recovery.

3. Summary

Backing up the etcd cluster is the key to protecting a Kubernetes cluster. Restoration requires stopping kube‑apiserver, stopping etcd, restoring data on one node, starting etcd, and finally restarting kube‑apiserver.

Only one etcd node needs to be backed up; the snapshot is synchronized to other nodes.

Restoring from a single node’s snapshot is sufficient.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Kubernetes Cluster backup etcd Restore crontab

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.