Fundamentals 16 min read

How ZooKeeper Coordinates Distributed Systems: Nodes, Watchers, and Leader Election

This article explains ZooKeeper's core concepts—including ZNode data storage, node types, watcher mechanisms, session management, and the leader‑follower‑observer architecture—illustrating how it enables reliable coordination and atomic operations in distributed systems.

macrozheng
macrozheng
macrozheng
How ZooKeeper Coordinates Distributed Systems: Nodes, Watchers, and Leader Election

The Internet era brings an explosion of information, and high concurrency drives the widespread adoption of distributed systems.

ZooKeeper is a popular distributed system solution used for data publishing/subscription, load balancing, naming services, cluster management, and more.

From a Simple Example

Multiple applications (e.g., A and B) may read the same configuration C; when C changes, both need to be notified. ZooKeeper creates a server (ZServer) to store C in a ZNode, and clients (ClientA and ClientB) connect to retrieve C.

ClientA and ClientB fetching C from ZooKeeper Server
ClientA and ClientB fetching C from ZooKeeper Server

ZNode stores data as a tree; nodes can hold data and have child nodes. ZNodes are kept in memory for efficiency.

Clients access data using path strings like "/RootNode/C".

Node Types

PERSISTENT : remains until explicitly deleted.

EPHEMERAL : tied to the client session and removed when the session ends.

SEQUENTIAL : assigned a monotonically increasing integer suffix, useful for ordering.

Combining these yields four node types: persistent, persistent‑sequential, ephemeral, and ephemeral‑sequential.

Watcher Mechanism

Clients register a Watcher on a ZNode (e.g., "/RootNode/C"). When the node changes, the server notifies the registered clients.

The process involves three steps: client registers the watcher, server processes it, and client receives a callback.

Watcher registration, processing, and callback
Watcher registration, processing, and callback

Version Control

ZNode version numbers (stored in the Stat object) enable optimistic locking. When multiple clients attempt to write, each reads the current version; a write succeeds only if the version matches, otherwise the client retries.

Alternatively, clients can create ordered temporary nodes under C; the smallest sequence number gets to write first, ensuring FIFO ordering.

Session Management

A session represents the connection between a client and the server, with states such as Connecting, Connected, Reconnecting, Reconnected, and Closed. SessionTracker on the server monitors sessions, clears expired ones, and removes associated temporary nodes.

Session expiration is calculated using TickTime based on the current time, session timeout, and the server's check interval.

Server Cluster: Leader, Follower, Observer

ZooKeeper clusters improve reliability. The Leader handles all write (transaction) requests, ensuring order. Followers handle read requests and forward writes to the Leader. Observers also handle reads but do not participate in leader election.

Leader election uses a voting process where servers compare (ServerID, ZXID) pairs; the server with the highest ZXID (or highest ServerID if ZXIDs tie) becomes the Leader.

Leader election example
Leader election example

The Leader uses the ZAB (Atomic Broadcast) protocol, a two‑phase commit: it sends a PROPOSAL to Followers, receives ACKs, then sends COMMIT, after which Followers write the data.

ZAB proposal broadcast
ZAB proposal broadcast

Summary

ZooKeeper coordinates distributed systems through ZNodes, watchers, session management, version control, and a robust leader‑follower‑observer architecture, providing reliable data consistency and fault tolerance.

distributed systemsZookeeperLeader ElectionZNodesession managementWatcher
macrozheng
Written by

macrozheng

Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.