How ZooKeeper Coordinates Distributed Systems: Nodes, Watchers, and Leader Election
This article explains ZooKeeper's core concepts—including ZNode data storage, node types, watcher mechanisms, session management, and the leader‑follower‑observer architecture—illustrating how it enables reliable coordination and atomic operations in distributed systems.
The Internet era brings an explosion of information, and high concurrency drives the widespread adoption of distributed systems.
ZooKeeper is a popular distributed system solution used for data publishing/subscription, load balancing, naming services, cluster management, and more.
From a Simple Example
Multiple applications (e.g., A and B) may read the same configuration C; when C changes, both need to be notified. ZooKeeper creates a server (ZServer) to store C in a ZNode, and clients (ClientA and ClientB) connect to retrieve C.
ZNode stores data as a tree; nodes can hold data and have child nodes. ZNodes are kept in memory for efficiency.
Clients access data using path strings like "/RootNode/C".
Node Types
PERSISTENT : remains until explicitly deleted.
EPHEMERAL : tied to the client session and removed when the session ends.
SEQUENTIAL : assigned a monotonically increasing integer suffix, useful for ordering.
Combining these yields four node types: persistent, persistent‑sequential, ephemeral, and ephemeral‑sequential.
Watcher Mechanism
Clients register a Watcher on a ZNode (e.g., "/RootNode/C"). When the node changes, the server notifies the registered clients.
The process involves three steps: client registers the watcher, server processes it, and client receives a callback.
Version Control
ZNode version numbers (stored in the Stat object) enable optimistic locking. When multiple clients attempt to write, each reads the current version; a write succeeds only if the version matches, otherwise the client retries.
Alternatively, clients can create ordered temporary nodes under C; the smallest sequence number gets to write first, ensuring FIFO ordering.
Session Management
A session represents the connection between a client and the server, with states such as Connecting, Connected, Reconnecting, Reconnected, and Closed. SessionTracker on the server monitors sessions, clears expired ones, and removes associated temporary nodes.
Session expiration is calculated using TickTime based on the current time, session timeout, and the server's check interval.
Server Cluster: Leader, Follower, Observer
ZooKeeper clusters improve reliability. The Leader handles all write (transaction) requests, ensuring order. Followers handle read requests and forward writes to the Leader. Observers also handle reads but do not participate in leader election.
Leader election uses a voting process where servers compare (ServerID, ZXID) pairs; the server with the highest ZXID (or highest ServerID if ZXIDs tie) becomes the Leader.
The Leader uses the ZAB (Atomic Broadcast) protocol, a two‑phase commit: it sends a PROPOSAL to Followers, receives ACKs, then sends COMMIT, after which Followers write the data.
Summary
ZooKeeper coordinates distributed systems through ZNodes, watchers, session management, version control, and a robust leader‑follower‑observer architecture, providing reliable data consistency and fault tolerance.
macrozheng
Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.