Basic Introduction to Apache ZooKeeper: Architecture, Data Model, Sessions, ACLs, and Cluster Mechanisms
This article provides a comprehensive overview of Apache ZooKeeper, covering its ZAB protocol, data model hierarchy, node types, storage mechanisms, watch system, session lifecycle, ACL configurations, serialization with Jute, cluster roles, leader election, log management, and practical use cases such as distributed locks, ID generation, load balancing, and integration with frameworks like Dubbo and Kafka.
ZooKeeper Basic Introduction
Apache ZooKeeper, originally a sub‑project of Apache Hadoop, offers efficient and reliable distributed coordination services for distributed applications.
Key Features
Sequential Consistency : Transactions from a client are applied in the exact order they were issued.
Atomicity : Either all changes of a transaction are applied or none.
Single View : All clients see a consistent data model regardless of the server they connect to.
Reliability : Once a transaction is committed, its effects persist until another transaction modifies them.
Real‑time Visibility : Clients can read the latest state immediately after a transaction succeeds.
Data Model
ZooKeeper stores data in a hierarchical tree similar to a file system, with a fixed root node (/) . Nodes are addressed using absolute paths separated by slashes, e.g., get /work/task . Each node holds a byte array, ACL information, optional children, and a stat structure.
Node Types
Persistent : Remains after the client session ends; must be deleted explicitly.
Ephemeral : Automatically removed when the client session expires; useful for tracking live servers (e.g., /servers/host ).
Sequential : ZooKeeper appends a monotonically increasing number to the node name, enabling ordered creation (e.g., works/task-00000001 ).
Node State Attributes
Attribute
Description
czxid
Creation transaction ID
ctime
Creation timestamp
mzxid
Last modification transaction ID
mtime
Last modification timestamp
pzxid
Last child‑modification transaction ID
cversion
Child version
version
Data version
aversion
ACL version
ephemeralOwner
Session ID of the creator (0 for persistent nodes)
dataLength
Length of the data byte array
numChildren
Number of child nodes
Storage
Data is kept in memory for fast access, while transaction logs and snapshots are persisted on local disks. Logs record local session operations for synchronization; snapshots periodically flush the in‑memory tree to disk.
Watch Mechanism
Clients can register a Watcher when creating a ZooKeeper instance or via getData , exists , and getChildren . Watches are one‑time triggers that fire when the watched node changes, requiring re‑registration after each event.
new ZooKeeper(String connectString, int sessionTimeout, Watcher watcher)Session Management
A session consists of a session ID, timeout, and closing flag. Sessions transition through states such as CONNECTING, CONNECTED, RECONNECTING, and CLOSED. Heartbeat (ping or regular requests) refreshes the session expiration time, which is managed in bucketed queues to improve efficiency.
ACL (Access Control List)
ZooKeeper supports three permission schemes: IP range (e.g., ip:192.168.0.11/22 ), Digest (username:password hashed with SHA‑1+BASE64), and World (open to all). Permissions include create, write, read, delete, and admin.
Serialization (Jute)
ZooKeeper uses the Jute framework for serialization. Classes implement the Record interface with serialize and deserialize methods, using writeLong , writeString , etc.
class test_jute implements Record {
private long ids;
private String name;
public void serialize(OutputArchive a, String tag) { ... }
public void deserialize(InputArchive a, String tag) { ... }
}Cluster Architecture
ZooKeeper clusters consist of Leader, Follower, and Observer nodes. Transactional requests are forwarded to the Leader, which replicates the changes to Followers/Observers. Leader election uses the FastLeaderElection algorithm and ZAB (ZooKeeper Atomic Broadcast) protocol.
Leader Election Process
Servers exchange votes containing logicClock , state , self_id , self_zxid , vote_id , and vote_zxid . The candidate with the highest zxid and ID wins after a majority of votes.
Log Management
ZooKeeper generates transaction logs and snapshots. Tools like PurgeTxnLog or Linux crontab scripts can be used to clean old logs and free disk space.
Practical Use Cases
Distributed Lock : Create an ephemeral sequential node under /lock ; the smallest node holds the lock. Watch the predecessor node to avoid the thundering herd problem.
Distributed ID Generation : Use sequential nodes as unique, ordered IDs.
Load Balancing : Store server connection counts under /servers and apply a minimum‑connection algorithm to select the target server.
Framework Integration : ZooKeeper serves as the registry for Dubbo, stores broker metadata for Kafka, and supports many other distributed systems.
References
《从Paxos到Zookeeper 分布式一致性原理与实践》
Wukong Talks Architecture
Explaining distributed systems and architecture through stories. Author of the "JVM Performance Tuning in Practice" column, open-source author of "Spring Cloud in Practice PassJava", and independently developed a PMP practice quiz mini-program.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.