Introduction to ZooKeeper: Architecture, Features, and Usage
This article introduces ZooKeeper as a distributed coordination framework, explains its file‑system‑like data model, watcher mechanism, node types, core features, provided functionalities such as publish/subscribe, distributed locks, load balancing, naming service, leader election, and details the ZAB consensus protocol.
1 ZooKeeper Introduction
ZooKeeper is an open‑source distributed coordination framework that provides consistency services for distributed applications and acts as a manager within the big‑data ecosystem. It encapsulates complex, error‑prone services into efficient, stable, and easy‑to‑use APIs.
If the official description is unclear, you can think of ZooKeeper as a file system plus a watcher notification mechanism .
1.1 File System
ZooKeeper maintains a tree‑like data structure similar to a file system. Each node (called a znode ) can store up to 1 MB of data, so ZooKeeper is not suitable for large data volumes. Four znode types are supported:
PERSISTENT : remains after the client disconnects.
PERSISTENT_SEQUENTIAL : remains after disconnect; ZooKeeper adds a sequential number to the name.
EPHEMERAL : deleted when the client disconnects.
EPHEMERAL_SEQUENTIAL : deleted on disconnect; name is sequentially numbered.
1.2 Watcher Notification Mechanism
The Watcher mechanism is a key feature of ZooKeeper. Clients can register watch events on nodes to be notified of data changes, node deletions, or child‑node status changes. Watchers are one‑time triggers; to achieve permanent watching, clients must re‑register after each notification.
Watcher processing consists of three steps:
Client registers a watcher (via getData , exists , or getChildren ). Server processes the watcher. Server invokes the client callback.
The overall watcher flow involves a main thread creating a ZooKeeper client, which spawns a network thread and a listener thread. The listener thread receives events from the server and calls the user‑defined process() method.
1.3 ZooKeeper Features
Cluster : a leader with multiple followers.
High Availability : the cluster works as long as a majority of nodes are alive.
Global Data Consistency : every server stores the same data copy.
Ordered Updates : requests from the same client are executed in order.
Atomic Updates : a transaction either fully succeeds or fails.
Real‑time Reads : clients can read the latest data within a bounded time.
From a design pattern perspective, ZooKeeper implements the Observer pattern.
ZooKeeper provides CP consistency, unlike Spring Cloud’s Eureka which is AP.
2 Functions Provided by ZooKeeper
By combining rich data nodes with the watcher mechanism, ZooKeeper can implement core distributed‑application functions such as data publish/subscribe, load balancing, naming service, distributed coordination/notification, cluster management, master election, distributed locks, and distributed queues.
2.1 Data Publish/Subscribe
Small, frequently changing data shared by a few machines is ideal for storage in ZooKeeper.
Data Storage : store data in a ZooKeeper node.
Data Retrieval : clients read the node at startup and register a watcher for changes.
Data Change : when data changes, ZooKeeper notifies all watching clients, which then re‑read the updated value.
2.2 Distributed Lock
ZooKeeper‑based locks typically use either a single exclusive temporary node (prone to the “herd effect”) or a temporary sequential node where the smallest sequence number obtains the lock, avoiding the herd effect.
2.3 Load Balancing
Multiple identical services can be registered in ZooKeeper; clients fetch the list of service addresses and randomly select one to execute tasks. Compared with Nginx, ZooKeeper load balancing has no single‑point bottleneck and requires custom balancing algorithms.
2.4 Naming Service
By creating a globally unique path in ZooKeeper, a name can be bound to a cluster address, service endpoint, or remote object.
2.5 Distributed Coordination / Notification
When a node’s value changes, ZooKeeper sends a watcher event to all clients that registered a watch on that node.
Workers can create temporary nodes with progress information; a monitor watches the parent node to obtain real‑time global progress.
2.6 Cluster Management
ZooKeeper is widely used to manage the dynamic up/down status of machines and leader election in big‑data clusters.
3 Leader Election
ZooKeeper clusters must have an odd number of nodes (typically 3 or 5). Leader election occurs during startup and when the current leader fails.
3.1 Node States
LOOKING : searching for a leader.
FOLLOWING : follower state, forwards client requests to the leader.
LEADING : leader state, processes all write requests.
OBSERVING : read‑only follower introduced in version 3.0, does not vote.
3.2 Election Process
During startup, each server votes for itself (myid, ZXID). Votes are exchanged, and the server with the highest id (or highest ZXID when IDs are equal) that obtains a majority becomes the leader. The example shows a 5‑node cluster where server 3 wins the election.
3.3 Split‑Brain (Brain Split)
ZooKeeper’s ZAB protocol requires a cluster size of 2N+1 to guarantee that a majority always exists, preventing split‑brain scenarios.
4 ZAB Consensus Protocol
ZAB (ZooKeeper Atomic Broadcast) is a crash‑recovery‑oriented consensus protocol that ensures strong consistency and high availability in ZooKeeper clusters.
4.1 Overview
Leader handles write requests, followers handle reads and synchronization. The protocol has two modes: atomic broadcast (normal operation) and crash‑recovery (leader failure).
4.2 Atomic Broadcast Mode
Leader receives a write, creates a transaction with a unique ZXID .
Leader proposes the transaction to all followers via a FIFO queue.
Followers write the proposal to disk and ACK the leader.
Leader commits once a majority ACKs, then sends a commit to followers.
Followers commit in ZXID order to preserve total order.
4.3 Crash Recovery
If the leader crashes, ZAB guarantees that transactions already committed by the leader are eventually committed on all servers, while uncommitted proposals are discarded. This relies on the globally increasing ZXID (epoch + xid).
4.4 ZAB Properties
Reliable delivery: a committed transaction is eventually committed on every server.
Total order: all servers execute transactions in the same order.
Causal order: later transactions are applied after earlier ones.
High availability: a majority of nodes is sufficient for normal operation.
Recoverability: a restarted node can catch up to the current state.
4.5 Comparison with Paxos
Both ZAB and Paxos have a leader that coordinates followers and require a majority to commit. The main difference is that ZAB is designed for high‑availability master‑slave data systems (ZooKeeper), whereas Paxos targets generic distributed state machines.
5 Miscellaneous ZooKeeper Knowledge
5.1 Common Commands
ZooKeeper can be deployed in three modes: single‑node, cluster, or pseudo‑cluster (multiple instances on one machine). Typical CLI commands include help , ls , create , get , set , stat , delete , and rmr .
5.2 ZooKeeper Clients
Native client : asynchronous API requiring callbacks and manual watcher re‑registration.
ZkClient : wrapper that auto‑reconnects and provides persistent watchers.
Curator : Apache top‑level project offering higher‑level recipes, automatic reconnection, and simplified error handling.
ZooInspector : graphical client tool for browsing the ZooKeeper tree.
5.3 ACL Permission Control
Access Control Lists (ACL) restrict operations such as read/write, node creation, deletion, and permission changes on ZooKeeper nodes.
5.4 Usage Tips
Cluster size should be odd; larger clusters increase reliability but reduce throughput.
ZooKeeper stores data in memory; avoid large payloads.
Log directories can grow; rotate or prune old files.
Default max client connections is 60; adjust maxClientCnxns as needed.
Full-Stack Internet Architecture
Introducing full-stack Internet architecture technologies centered on Java
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.