Databases 27 min read

Mastering etcd: From Basics to Cluster Deployment and Real‑World Use Cases

This comprehensive guide explains what etcd is, its Raft‑based architecture, key terminology, read/write flow, leader election, common scenarios such as service discovery and distributed locking, and provides step‑by‑step instructions for single‑node and multi‑node deployments with practical command examples.

Efficient Ops
Efficient Ops
Efficient Ops
Mastering etcd: From Basics to Cluster Deployment and Real‑World Use Cases

1. Overview

1.1 What is etcd?

etcd is an open‑source, highly available distributed key‑value database launched by the CoreOS team in June 2013. It implements the Raft consensus algorithm and is written in Go.

1.2 History

1.3 Features

Simple : easy installation, HTTP API.

Secure : supports SSL certificate verification.

Fast : benchmark shows >2k reads per second per instance.

Reliable : Raft ensures data availability and consistency.

1.4 Key Terminology

Raft : consensus algorithm used by etcd.

Node : a Raft state‑machine instance.

Member : an etcd instance that manages a node and serves client requests.

Cluster : a group of members that work together.

Peer : another member in the same cluster.

Client : entity that sends HTTP requests to the cluster.

WAL : write‑ahead log for persistence.

Snapshot : point‑in‑time copy of the data to truncate the WAL.

Proxy : mode that provides reverse‑proxy services for the cluster.

Leader : node elected to handle all data commits.

Follower : node that replicates the leader’s log.

Candidate : node that starts an election when it loses contact with the leader.

Term : period between two elections.

Index : log entry identifier used together with term.

1.5 Read/Write Flow

All writes go to the leader; the leader replicates to followers, guaranteeing strong consistency. Reads can be served by any node because data is kept consistent.

1.6 Leader Election

In a three‑node cluster each node runs a random timer; the first timer to expire triggers a vote request, and the node that receives a majority becomes the leader. If the leader fails, a new election starts.

1.7 Write Quorum

A write is considered successful when it is replicated to a quorum of nodes (Quorum = N/2 + 1). The minimum recommended cluster size is three nodes.

2. etcd Architecture

2.1 Components

HTTP Server : handles client API requests and inter‑node communication.

Store : implements transactions, indexing, watches, and event handling.

Raft : core consensus module.

WAL : write‑ahead log for persistence; snapshots are taken to truncate the log.

2.2 Data Flow

A client request reaches the HTTP server, is processed by the Store, and if it modifies cluster state it is passed to Raft, which records the entry in the WAL and replicates it to other members before committing.

3. Common Use Cases

3.1 Service Registration & Discovery

Backend services register themselves in etcd; front‑end components query etcd to discover service endpoints, enabling load‑balancing and failover.

3.2 Message Pub/Sub

etcd can act as a lightweight message broker where producers register topics and consumers watch those topics for new messages.

3.3 Load Balancing

Multiple identical service instances register in etcd; clients retrieve the list of healthy endpoints and distribute traffic among them.

3.4 Distributed Coordination

Watch loss triggers health checks.

Controllers use etcd to start/stop services.

Services report status updates to etcd, which notifies interested parties.

3.5 Distributed Lock

etcd provides a lock primitive that ensures only one contender holds the lock at a time.

3.6 Distributed Queue

Each node can create its own queue under a common prefix; consumers can watch the prefix to process tasks.

3.7 Monitoring & Leader Election

Raft‑based leader election is visible through etcd’s watch mechanism.

4. Installation & Deployment

4.1 Single‑Node Installation

<code>hostnamectl set-hostname etcd-1
wget http://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
rpm -ivh epel-release-latest-7.noarch.rpm
# yum install etcd (3.3.11) or download newer binary
yum -y install etcd
systemctl enable etcd</code>

Configuration resides in

/etc/etcd/etcd.conf

. Default data directory is

default.etcd/

. Client URLs listen on 2379, peer URLs on 2380.

4.2 Cluster Deployment

Deploy an odd number of nodes (e.g., three) for fault tolerance.

4.2.1 Host Configuration

<code>cat >> /etc/hosts <<EOF
172.16.0.8 etcd-0-8
172.16.0.14 etcd-0-14
172.16.0.17 etcd-0-17
EOF</code>

4.2.2 Install etcd on each node

<code>wget http://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
rpm -ivh epel-release-latest-7.noarch.rpm
yum -y install etcd
systemctl enable etcd
mkdir -p /data/app/etcd/
chown etcd:etcd /data/app/etcd/</code>

4.2.3 Configure each member

<code># Example for etcd-0-8
ETCD_DATA_DIR="/data/app/etcd/"
ETCD_LISTEN_PEER_URLS="http://172.16.0.8:2380"
ETCD_LISTEN_CLIENT_URLS="http://127.0.0.1:2379,http://172.16.0.8:2379"
ETCD_NAME="etcd-0-8"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://172.16.0.8:2380"
ETCD_ADVERTISE_CLIENT_URLS="http://127.0.0.1:2379,http://172.16.0.8:2379"
ETCD_INITIAL_CLUSTER="etcd-0-8=http://172.16.0.8:2380,etcd-0-14=http://172.16.0.14:2380,etcd-0-17=http://172.16.0.17:2380"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-token"
ETCD_INITIAL_CLUSTER_STATE="new"
</code>

Start the service with

systemctl start etcd

and verify with

etcdctl member list

and

etcdctl cluster-health

.

5. Basic Operations

5.1 Put (set)

<code>$ etcdctl set /testdir/testkey "Hello world"
Hello world</code>

Options:

--ttl

,

--swap-with-value

,

--swap-with-index

.

5.2 Make (mk)

<code>$ etcdctl mk /testdir/testkey "Hello world"
Hello world</code>

Fails if the key already exists.

5.3 Delete (rm)

<code>$ etcdctl rm /testdir/testkey
PrevNode.Value: Hello</code>

Options:

--dir

,

--recursive

,

--with-value

,

--with-index

.

5.4 Update

<code>$ etcdctl update /testdir/testkey "Hello"
Hello</code>

5.5 Get

<code>$ etcdctl get /testdir/testkey
Hello world</code>

Options:

--sort

,

--consistent

.

5.6 Watch

<code>$ etcdctl watch /testdir/testkey
Hello watch</code>

Options:

--forever

,

--after-index

,

--recursive

.

5.7 Backup

<code>$ etcdctl backup --data-dir /var/lib/etcd --backup-dir /home/etcd_backup</code>

5.8 Member Management

<code>$ etcdctl member list
$ etcdctl member add etcd3 http://192.168.1.100:2380
$ etcdctl member remove <member-id></code>

6. Conclusion

etcd keeps only the most recent 1000 events, making it unsuitable for heavy‑write workloads.

Typical scenarios are configuration management and service discovery, which are read‑heavy.

Compared with ZooKeeper, etcd is simpler to use but often requires auxiliary tools (e.g., registrator, confd) for full service‑discovery automation.

There is currently no official graphical UI for etcd.

kubernetesservice discoveryRaftETCDCluster Deploymentdistributed key-value store
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.