Cloud Native 19 min read

Typical Use Cases and Implementation Details of etcd in Distributed Systems

This article introduces etcd, a highly‑available key‑value store based on the Raft algorithm, and explores its classic use cases such as service discovery, messaging, load balancing, distributed coordination, locks, queues, monitoring, leader election, and compares it with ZooKeeper, highlighting its simplicity, security, and cloud‑native advantages.

Art of Distributed System Architecture Design
Art of Distributed System Architecture Design
Art of Distributed System Architecture Design
Typical Use Cases and Implementation Details of etcd in Distributed Systems

With the rising popularity of CoreOS, Kubernetes and other open‑source projects, the etcd component—used as a highly‑available, strongly consistent service‑discovery and configuration store—has attracted increasing attention. In the cloud‑computing era, quickly and transparently integrating services into a compute cluster, sharing configuration information across all machines, and building a highly‑available, secure, easy‑to‑deploy, low‑latency service cluster are pressing challenges that etcd helps to solve.

Classic Application Scenarios

What is etcd? Many people first think of it as a simple key‑value store, overlooking the official definition that emphasizes its role in shared configuration and service discovery.

A highly‑available key value store for shared configuration and service discovery.

Inspired by ZooKeeper and doozer, etcd offers similar functionality but focuses on four aspects:

Simple : An HTTP+JSON API lets you use it easily with curl.

Secure : Optional SSL client authentication.

Fast : Each instance can handle about a thousand writes per second.

Trustworthy : Implements distributed consensus using the Raft algorithm.

As cloud computing evolves, distributed‑system problems gain more attention. Inspired by an Alibaba middleware team article on ZooKeeper, the author summarizes several classic etcd use cases.

Note: In distributed systems, data is divided into control data and application data. etcd is intended for control data; for application data it is recommended only when the data volume is small but updates are frequent.

Scenario 1: Service Discovery

Service discovery addresses the common problem of how processes or services in a distributed cluster find each other and establish connections. It requires three pillars:

A strongly consistent, highly available service directory . etcd, based on Raft, naturally provides this.

A mechanism to register services and monitor health . Users can register services in etcd and set a key TTL to maintain heartbeats.

A mechanism to locate and connect to services . By registering a service under a specific prefix, clients can discover it; deploying a proxy‑mode etcd on each machine ensures mutual accessibility.

Figure 1 Service Discovery Diagram

Specific examples include:

Dynamic addition of services in a micro‑service architecture . Register a service name in etcd and store the IPs of available instances under that prefix; clients simply look up the prefix to find a usable instance.

Figure 2 Micro‑service Collaboration

Transparent multi‑instance access and fault‑tolerant restart in PaaS platforms . Applications often have multiple instances behind a domain name; etcd can dynamically update DNS routing information when an instance fails or restarts.

Figure 3 Cloud Platform Multi‑Instance Transparency

Scenario 2: Publish‑Subscribe Messaging

Message publish‑subscribe is a common inter‑component communication pattern in distributed systems. A central configuration hub stores messages; subscribers receive real‑time notifications when topics change.

Centralized configuration management . Applications fetch configuration from etcd at startup and register a watcher to receive updates automatically.

Distributed search service metadata and node status stored in etcd, with key TTL ensuring timely updates.

Distributed log collection . Collectors create a directory per application/topic in etcd, store IPs of log sources, and set a recursive watcher to adjust task allocation when nodes change.

Dynamic information retrieval via HTTP APIs . Expose JMX or other interfaces; store runtime data in etcd directories for external access.

Figure 4 Publish‑Subscribe Messaging

Scenario 3: Load Balancing

In the earlier service‑discovery scenario, load balancing refers to soft load balancing. By deploying multiple identical service instances, traffic can be distributed across nodes, improving availability and read performance.

etcd’s distributed architecture naturally supports load‑balanced reads . Each core node can serve client requests, making etcd suitable for storing small, frequently accessed data such as code tables.

Maintain a load‑balancing node table in etcd . etcd can monitor the health of multiple nodes and forward requests to alive nodes, similar to how ZooKeeper assists Kafka’s producer‑consumer balancing.

Figure 5 Load Balancing

Scenario 4: Distributed Notification and Coordination

Similar to publish‑subscribe, this scenario uses etcd’s watcher mechanism to achieve low‑coupling notifications and coordination across systems.

Low‑coupling heartbeat detection . Systems register a heartbeat key in etcd; the presence or absence of the key indicates liveness.

System scheduling . A control console updates etcd directories; registered workers receive notifications and execute corresponding tasks.

Work progress reporting . Sub‑tasks create a temporary directory in etcd and periodically write progress, allowing managers to monitor real‑time status.

Figure 6 Distributed Coordination

Scenario 5: Distributed Locks

Because etcd guarantees strong consistency via Raft, implementing distributed locks is straightforward.

Exclusive lock . etcd provides an atomic CompareAndSwap ( CompareAndSwap ) API; setting prevExist ensures only one client can create a lock directory.

Sequenced lock . etcd’s auto‑created ordered keys (using a POST request) assign a unique sequence number to each contender, establishing a global order for lock acquisition.

Figure 7 Distributed Lock

Scenario 6: Distributed Queues

Distributed queues follow the same ordered‑key principle as locks, providing FIFO semantics.

An interesting pattern is to execute tasks only when a certain condition is met, implemented by adding a /queue/condition node.

Condition representing queue size . Increment the condition when a sub‑task becomes ready; once the threshold is reached, the main task proceeds.

Condition indicating task presence . Certain tasks must complete before others can start, similar to a dependency graph.

Condition as a trigger for other tasks . A controller updates the condition, prompting the queue to start processing.

Figure 8 Distributed Queue

Scenario 7: Cluster Monitoring and Leader Election

Monitoring with etcd is simple and real‑time.

Watchers detect node disappearance or changes instantly.

Nodes can set a TTL key (e.g., a heartbeat every 30 seconds) to indicate liveness; missing heartbeats cause the key to expire.

This enables immediate health‑status detection and supports leader election via the distributed lock mechanism. A classic example is building a full‑text index in a search system: only the elected leader performs the heavy indexing work, then distributes results to followers.

Figure 9 Leader Election

Scenario 8: Why Choose etcd Over ZooKeeper?

Although ZooKeeper can implement many of the same functions, etcd offers several advantages:

Complexity . ZooKeeper’s deployment and maintenance are intricate; it relies on the Paxos algorithm, which is notoriously hard to understand, and provides only Java and C client libraries.

Heavy Java dependency . Java tends to bring many dependencies, whereas operators often prefer lightweight, easily maintainable clusters.

Slower development . The Apache governance model can lead to slower project evolution.

etcd’s advantages include:

Simplicity . Written in Go, it is easy to deploy; HTTP APIs make interaction straightforward; Raft provides an understandable consistency model.

Data persistence . Updates are persisted immediately.

Security . SSL client authentication is supported out of the box.

Although still relatively young, etcd is already used in production by CoreOS, Kubernetes, CloudFoundry and other major projects, making it a compelling choice for modern cloud‑native architectures.

distributed systemsCloud Nativeservice-discoveryConfiguration ManagementRaftetcdkey-value store
Art of Distributed System Architecture Design
Written by

Art of Distributed System Architecture Design

Introductions to large-scale distributed system architectures; insights and knowledge sharing on large-scale internet system architecture; front-end web architecture overviews; practical tips and experiences with PHP, JavaScript, Erlang, C/C++ and other languages in large-scale internet system development.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.