Mastering etcd: From Basics to Cluster Deployment and Real‑World Use Cases
This comprehensive guide explains what etcd is, its Raft‑based architecture, key terminology, read/write flow, leader election, common scenarios such as service discovery and distributed locking, and provides step‑by‑step instructions for single‑node and multi‑node deployments with practical command examples.
1. Overview
1.1 What is etcd?
etcd is an open‑source, highly available distributed key‑value database launched by the CoreOS team in June 2013. It implements the Raft consensus algorithm and is written in Go.
1.2 History
1.3 Features
Simple : easy installation, HTTP API.
Secure : supports SSL certificate verification.
Fast : benchmark shows >2k reads per second per instance.
Reliable : Raft ensures data availability and consistency.
1.4 Key Terminology
Raft : consensus algorithm used by etcd.
Node : a Raft state‑machine instance.
Member : an etcd instance that manages a node and serves client requests.
Cluster : a group of members that work together.
Peer : another member in the same cluster.
Client : entity that sends HTTP requests to the cluster.
WAL : write‑ahead log for persistence.
Snapshot : point‑in‑time copy of the data to truncate the WAL.
Proxy : mode that provides reverse‑proxy services for the cluster.
Leader : node elected to handle all data commits.
Follower : node that replicates the leader’s log.
Candidate : node that starts an election when it loses contact with the leader.
Term : period between two elections.
Index : log entry identifier used together with term.
1.5 Read/Write Flow
All writes go to the leader; the leader replicates to followers, guaranteeing strong consistency. Reads can be served by any node because data is kept consistent.
1.6 Leader Election
In a three‑node cluster each node runs a random timer; the first timer to expire triggers a vote request, and the node that receives a majority becomes the leader. If the leader fails, a new election starts.
1.7 Write Quorum
A write is considered successful when it is replicated to a quorum of nodes (Quorum = N/2 + 1). The minimum recommended cluster size is three nodes.
2. etcd Architecture
2.1 Components
HTTP Server : handles client API requests and inter‑node communication.
Store : implements transactions, indexing, watches, and event handling.
Raft : core consensus module.
WAL : write‑ahead log for persistence; snapshots are taken to truncate the log.
2.2 Data Flow
A client request reaches the HTTP server, is processed by the Store, and if it modifies cluster state it is passed to Raft, which records the entry in the WAL and replicates it to other members before committing.
3. Common Use Cases
3.1 Service Registration & Discovery
Backend services register themselves in etcd; front‑end components query etcd to discover service endpoints, enabling load‑balancing and failover.
3.2 Message Pub/Sub
etcd can act as a lightweight message broker where producers register topics and consumers watch those topics for new messages.
3.3 Load Balancing
Multiple identical service instances register in etcd; clients retrieve the list of healthy endpoints and distribute traffic among them.
3.4 Distributed Coordination
Watch loss triggers health checks.
Controllers use etcd to start/stop services.
Services report status updates to etcd, which notifies interested parties.
3.5 Distributed Lock
etcd provides a lock primitive that ensures only one contender holds the lock at a time.
3.6 Distributed Queue
Each node can create its own queue under a common prefix; consumers can watch the prefix to process tasks.
3.7 Monitoring & Leader Election
Raft‑based leader election is visible through etcd’s watch mechanism.
4. Installation & Deployment
4.1 Single‑Node Installation
<code>hostnamectl set-hostname etcd-1
wget http://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
rpm -ivh epel-release-latest-7.noarch.rpm
# yum install etcd (3.3.11) or download newer binary
yum -y install etcd
systemctl enable etcd</code>Configuration resides in
/etc/etcd/etcd.conf. Default data directory is
default.etcd/. Client URLs listen on 2379, peer URLs on 2380.
4.2 Cluster Deployment
Deploy an odd number of nodes (e.g., three) for fault tolerance.
4.2.1 Host Configuration
<code>cat >> /etc/hosts <<EOF
172.16.0.8 etcd-0-8
172.16.0.14 etcd-0-14
172.16.0.17 etcd-0-17
EOF</code>4.2.2 Install etcd on each node
<code>wget http://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
rpm -ivh epel-release-latest-7.noarch.rpm
yum -y install etcd
systemctl enable etcd
mkdir -p /data/app/etcd/
chown etcd:etcd /data/app/etcd/</code>4.2.3 Configure each member
<code># Example for etcd-0-8
ETCD_DATA_DIR="/data/app/etcd/"
ETCD_LISTEN_PEER_URLS="http://172.16.0.8:2380"
ETCD_LISTEN_CLIENT_URLS="http://127.0.0.1:2379,http://172.16.0.8:2379"
ETCD_NAME="etcd-0-8"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://172.16.0.8:2380"
ETCD_ADVERTISE_CLIENT_URLS="http://127.0.0.1:2379,http://172.16.0.8:2379"
ETCD_INITIAL_CLUSTER="etcd-0-8=http://172.16.0.8:2380,etcd-0-14=http://172.16.0.14:2380,etcd-0-17=http://172.16.0.17:2380"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-token"
ETCD_INITIAL_CLUSTER_STATE="new"
</code>Start the service with
systemctl start etcdand verify with
etcdctl member listand
etcdctl cluster-health.
5. Basic Operations
5.1 Put (set)
<code>$ etcdctl set /testdir/testkey "Hello world"
Hello world</code>Options:
--ttl,
--swap-with-value,
--swap-with-index.
5.2 Make (mk)
<code>$ etcdctl mk /testdir/testkey "Hello world"
Hello world</code>Fails if the key already exists.
5.3 Delete (rm)
<code>$ etcdctl rm /testdir/testkey
PrevNode.Value: Hello</code>Options:
--dir,
--recursive,
--with-value,
--with-index.
5.4 Update
<code>$ etcdctl update /testdir/testkey "Hello"
Hello</code>5.5 Get
<code>$ etcdctl get /testdir/testkey
Hello world</code>Options:
--sort,
--consistent.
5.6 Watch
<code>$ etcdctl watch /testdir/testkey
Hello watch</code>Options:
--forever,
--after-index,
--recursive.
5.7 Backup
<code>$ etcdctl backup --data-dir /var/lib/etcd --backup-dir /home/etcd_backup</code>5.8 Member Management
<code>$ etcdctl member list
$ etcdctl member add etcd3 http://192.168.1.100:2380
$ etcdctl member remove <member-id></code>6. Conclusion
etcd keeps only the most recent 1000 events, making it unsuitable for heavy‑write workloads.
Typical scenarios are configuration management and service discovery, which are read‑heavy.
Compared with ZooKeeper, etcd is simpler to use but often requires auxiliary tools (e.g., registrator, confd) for full service‑discovery automation.
There is currently no official graphical UI for etcd.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.