Big Data 10 min read

Why Kafka 2.8 Drops Zookeeper: Architecture, Challenges, and KIP‑500

This article explains how Kafka 2.8 removes its dependency on Zookeeper, describes Kafka's core concepts and its interaction with Zookeeper, outlines the role of the Controller, discusses operational complexities and upgrade paths with KIP‑500, and highlights the benefits of the new KRaft‑based architecture.

macrozheng
macrozheng
macrozheng
Why Kafka 2.8 Drops Zookeeper: Architecture, Challenges, and KIP‑500
Recently the Confluent community announced that Kafka 2.8 will drop Zookeeper, a major improvement for Kafka users. Previously Kafka required a Zookeeper deployment; now only Kafka needs to be deployed.

1. Kafka Introduction

Apache Kafka was originally developed by LinkedIn and later donated to the Apache Foundation. Kafka is defined as a distributed streaming platform, offering high throughput, persistence, and horizontal scalability. Its main functions include:

Message queue: Kafka provides system decoupling, traffic shaping, buffering, and asynchronous communication.

Distributed storage: Kafka persists messages and uses replicas for failover, serving as a data store.

Real‑time data processing: Kafka Streams and Kafka Connect enable real‑time processing.

The following diagram shows Kafka's message model:

Key concepts in Kafka include:

producer and consumer : producers push messages to the queue, consumers pull them.

consumer group : a set of consumers that can read from the same topic across different partition s.

broker : a server in the Kafka cluster.

topic : a category of messages.

partition : a physical subdivision of a topic ; each partition assigns an ordered id as an offset . Only one consumer in a group can read a given partition.

2. Kafka and Zookeeper Relationship

Kafka's architecture relies on Zookeeper for coordination. The following diagram illustrates the overall architecture:

Zookeeper acts as a registration center for brokers, topics, and consumers.

2.1 Registration Center

Broker registration : Each broker registers its IP and port under

/brokers/ids

in Zookeeper. The registration node is temporary and disappears if the broker crashes.

Topic registration : Zookeeper creates a node for each topic under

/brokers/topics/[topic_name]

and records the mapping between partitions and brokers.

Consumer registration : Consumer groups register under

/consumers/{group_id}

, allowing Zookeeper to track partition‑consumer relationships and offsets.

2.2 Load Balancing

After broker registration, producers discover broker lists via Zookeeper, enabling dynamic load balancing. Consumer groups use topic node information to pull messages from specific partitions.

3. Controller Overview

One broker is elected as the Controller, which interacts with Zookeeper to manage metadata for all partitions and replicas. The Controller monitors partition changes, topic changes, and broker changes, and updates cluster metadata accordingly.

When a partition leader fails, the Controller elects a new leader. It also handles partition additions and ISR (in‑sync replica) changes.

The Controller processes Zookeeper events, timer tasks, and other events via a

LinkedBlockingQueue

, updating its metadata as needed.

4. Problems Introduced by Zookeeper

Kafka becomes a distributed system that depends on another distributed system, increasing operational complexity.

4.1 Operational Complexity

Deploying Kafka with Zookeeper requires managing two systems, and operators must be proficient with both.

4.2 Controller Failure Handling

If the Controller node fails, a new broker is elected as Controller, pulls metadata from Zookeeper, and notifies all brokers. During this transition, the cluster is unavailable.

4.3 Partition Bottleneck

As the number of partitions grows, Zookeeper stores more metadata, increasing load and latency, which impacts Kafka performance. A single Kafka cluster thus has a practical partition limit.

5. Upgrade Path

KIP‑500 replaces the traditional Controller with a Quorum Controller using the KRaft protocol. Each quorum controller stores full metadata, and Raft ensures consistency. If a quorum controller fails, a new controller is elected quickly.

After the upgrade, Kafka can support millions of partitions. The KIP‑500 code has been merged into the 2.8 release branch, and Kafka 3.0 will support both Zookeeper and Quorum Controllers for gradual migration.

6. Summary

In large‑scale and cloud‑native environments, Zookeeper adds significant operational and performance overhead to Kafka. Removing Zookeeper aligns with the trend toward simpler architectures and prepares Kafka for future scalability.

distributed systemsZookeeperKafkametadata managementKRaftKIP-500
macrozheng
Written by

macrozheng

Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.