Why etcd Is the Backbone of Cloud‑Native Service Discovery and Coordination
This article explains what etcd is, compares it with Zookeeper, describes its architecture and core components such as WAL, snapshots and boltdb, outlines its key features, and shows how it powers service registration, watch mechanisms, cluster monitoring and leader election in cloud‑native systems.
What is etcd?
etcd is a CNCF‑hosted, distributed, strongly consistent key‑value store written in Go. It is used for shared configuration, service discovery, leader election, distributed locks and cluster monitoring, making it a fundamental building block of cloud‑native architectures.
etcd vs Zookeeper
Consistency protocol: Raft (etcd) vs ZAB/Paxos (Zookeeper)
Operations: easier to operate with etcd
Project activity: active community for etcd, Zookeeper less active
API: HTTP+JSON and gRPC for etcd; Zookeeper requires its own client
Security: etcd supports HTTPS, Zookeeper does not
etcd Architecture
etcd runs as a cluster; each node stores the full data state in memory and persists changes via a Write‑Ahead Log (WAL). Snapshots capture the entire data at a point in time, and boltdb serves as the underlying storage engine, similar to a MySQL storage engine.
Basic Commands
etcdctl put key testCore Components
gRPC Server handles client requests and inter‑node communication within the cluster.
WAL (Write‑Ahead Log) records every change before it is applied, providing durability and enabling transaction‑style logging similar to MySQL’s redo log.
Snapshot periodically stores a full copy of the data to prevent the WAL from growing indefinitely.
boltdb is the embedded B+‑tree storage engine that backs each key with an index.
Key Features
Hierarchical storage of data in a file‑system‑like structure
Watch mechanism for key or prefix changes with notifications
Secure communication via SSL/TLS certificates
High performance (≈2K reads per second per instance)
Strong consistency using the Raft consensus algorithm
Revision numbers for each key to track operation order
Lease mechanism for TTL‑based automatic key expiration
Use Cases in the Engine
Services register themselves in etcd with a TTL lease; the master service registers and sends periodic heartbeats, while the scheduling service subscribes to a prefix (e.g.,
/publictest/pipeline/) to discover available workers.
server, err := NewServiceRegister(key, serviceAddress, 5)
if err != nil {
logging.WebLog.Error(err)
} // Set up lease keep‑alive
leaseRespChan, err := s.cli.KeepAlive(context.Background(), resp.ID)The scheduling service listens for put and delete events to maintain a local list of servers, enabling custom load‑balancing and failover logic.
Watch Mechanism
etcd can watch specific keys or directory prefixes and trigger callbacks when changes occur. In the engine, this drives cache expiration: keys representing cached items are watched, and when a lease expires, the corresponding cache entry is automatically cleared.
storage.Watch("cache/",
func(id string) { /* do nothing */ },
func(id string) { CleanCache(id) })Cluster Monitoring and Leader Election
Nodes set a lease on their heartbeat key; if the lease expires, the key disappears, allowing watchers to detect node failures instantly. Leader election is achieved via distributed locks stored in etcd; the first node to acquire the lock becomes the leader, while others become followers and monitor the leader’s key for failover.
JD Cloud Developers
JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.