Big Data 19 min read

Understanding Kafka Core Concepts: Architecture, Messaging Models, Partitioning, Consumer Groups, and Reliability

This article provides a comprehensive overview of Kafka, covering its layered architecture with Zookeeper, core concepts such as topics, partitions and consumer groups, communication workflow, partition selection strategies, rebalancing mechanisms, reliability configurations, replica synchronization, and reasons for moving away from Zookeeper, all explained in clear English.

Wukong Talks Architecture

Dec 8, 2021

Understanding Kafka Core Concepts: Architecture, Messaging Models, Partitioning, Consumer Groups, and Reliability

What is Kafka?

Kafka is a streaming data platform that also functions as a message queue, offering both messaging capabilities and real‑time stream processing.

Its architecture can be viewed in three layers:

First layer – Zookeeper: Acts as a registry and coordination service, storing cluster metadata and managing broker registration.

Second layer – Core concepts: Includes records (messages), topics (categories), producers (senders), consumers (receivers), brokers (Kafka servers), partitions (sharding for scalability), leader/follower replicas (high availability), offsets (sequential IDs), consumer groups (multiple consumers sharing work), coordinator (assigns partitions to groups), and controller (manages the whole cluster, elected via Zookeeper).

Third layer – Storage: Persists core data as log files on disk.

Message‑Queue Models Supported by Kafka

Traditional queues support two models:

Point‑to‑point – a message is consumed by a single consumer and then deleted.

Publish‑subscribe – a message is broadcast to all consumers.

Kafka achieves both models through consumer groups: if all consumers belong to the same group, the model behaves like point‑to‑point; if each consumer uses its own group, it behaves like publish‑subscribe.

Kafka Communication Process

When a broker starts, it registers its ID under brokers/ids in Zookeeper and watches that path for changes.

A producer specifies bootstrap.servers, creates TCP connections to the listed brokers, and obtains metadata (topics, partitions, leaders, etc.).

After connecting to any broker, the client fetches metadata and then establishes connections to all relevant brokers.

The producer sends messages; the consumer does the same with bootstrap.servers, first contacting a broker to locate the coordinator for its consumer group.

The consumer then connects to the coordinator, receives partition assignments, and finally connects to the leader brokers of those partitions to consume messages.

How Partitions Are Chosen When Sending Messages

Kafka supports two default strategies:

Round‑robin (sequential distribution across partitions).

Random distribution.

If a message includes a key, Kafka hashes the key and mods it by the number of partitions, guaranteeing that all messages with the same key land in the same partition (ensuring order for that key). Custom partitioning can be implemented by providing a Partitioner implementation that overrides configure and partition.

Why Partitions Are Needed

Partitions spread load across multiple broker nodes, enabling horizontal scaling, higher write throughput, parallel consumption, and fault‑tolerant replication.

Consumer Groups and Rebalancing

Ideally, the number of consumers in a group matches the total number of partitions; otherwise some consumers handle multiple partitions or some partitions remain idle.

Rebalancing occurs when the number of consumers, topics, or partitions changes. The process relies on heartbeats sent to the coordinator (controlled by heartbeat.interval.ms) and involves the following steps:

Each new consumer sends a JoinGroup request; the first becomes the group leader and receives the member list.

The leader runs the partition‑assignment algorithm and sends the result to the coordinator via SyncGroup.

Other members also send SyncGroup requests and receive their assigned partitions.

Partition Assignment Strategies

Kafka provides three built‑in strategies:

Range: Default; partitions are sorted and allocated in contiguous ranges, which can lead to imbalance when a consumer subscribes to multiple topics.

RoundRobin: Distributes partitions evenly across consumers regardless of topic, avoiding the imbalance of the Range strategy.

Sticky: Tries to keep existing assignments stable while still balancing load, reducing connection churn.

Ensuring Message Reliability

Reliability is addressed from three angles:

Producer side: Use asynchronous sends with callbacks, set acks=all to require all in‑sync replicas to acknowledge, configure a large retries value, and tune replication.factor and min.insync.replicas. Disable unsafe leader election with unclean.leader.election.enable=false.

Kafka internals: Replicate data across multiple brokers; the ISR set is maintained by replica.lag.time.max.ms.

Consumer side: Disable automatic offset commits ( enable.auto.commit=false) and manually commit after successful processing; optionally set auto.offset.reset=earliest to avoid loss on missing offsets.

Replica Synchronization Mechanism

All replicas of a partition form the AR (Assigned Replicas) set; those that are fully caught up belong to the ISR (In‑Sync Replicas) set, governed by replica.lag.time.max.ms.

The HW (High Watermark) marks the offset up to which all ISR have replicated data and is visible to consumers. The LEO (Log End Offset) indicates the next offset to be written.

The leader writes messages, updates its LEO, and once followers have replicated up to a certain point, the leader advances the HW to the smallest LEO among ISR, making those messages consumable.

Why Newer Kafka Versions Dropped Zookeeper

Maintaining a separate Zookeeper cluster adds operational complexity and cost. Moreover, Zookeeper is not optimized for the high‑frequency metadata updates required by Kafka (e.g., offset commits), leading to performance bottlenecks at large scale.

Why Kafka Is Fast

Sequential I/O: Messages are appended to log files, enabling fast sequential disk writes.

Page Cache & Zero‑Copy: Kafka uses memory‑mapped files (mmap) for writes and sendfile for reads, avoiding extra data copies.

Batching & Compression: Producers batch multiple records into a single request, and both producers and brokers apply compression, reducing network and storage overhead.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Distributed Systems message queues Kafka Reliability consumer groups

Written by

Wukong Talks Architecture

Explaining distributed systems and architecture through stories. Author of the "JVM Performance Tuning in Practice" column, open-source author of "Spring Cloud in Practice PassJava", and independently developed a PMP practice quiz mini-program.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.