Big Data 11 min read

Understanding Kafka Architecture: Topics, Partitions, Consumption Model, Network and Storage

This article explains Kafka's core architecture, covering how topics and partitions are stored, the advantages of its consumption model, the internal network and threading design, and the high‑reliability distributed log storage and replication mechanisms that ensure data durability and scalability.

Laravel Tech Community
Laravel Tech Community
Laravel Tech Community
Understanding Kafka Architecture: Topics, Partitions, Consumption Model, Network and Storage

Key Questions about Kafka Architecture

1. How are Kafka topics and partitions stored internally? 2. What advantages does Kafka's consumption model have over traditional messaging systems? 3. How does Kafka achieve distributed data storage and retrieval?

Kafka Architecture Diagram

Terminology

Broker: a Kafka node; multiple brokers form a cluster.

Topic: a logical channel for categorizing messages.

Producer: client that sends messages to brokers.

Consumer: client that reads messages from brokers.

ConsumerGroup: a set of consumers sharing the same group ID; only one consumer in a group processes a given message.

Partition: a physical ordered log segment within a topic; each partition is an append‑only log.

Topic and Partition Details

Each message belongs to a topic; a topic can have many partitions. Partitions store messages as an append‑only log, assigning a monotonically increasing offset to each record. Ordering is guaranteed only within a partition.

Producer routing to partitions: without a key, messages are round‑robin; with a key, the key is hashed and the result modulo the partition count determines the target partition, ensuring the same key always lands in the same partition.

Consumption Model

Kafka uses a pull‑based model: consumers poll the broker at their own pace, can specify offsets, and can reprocess messages, providing better control and reliability compared to push‑based systems.

Network Model

Client side: a single‑threaded selector handles connections, suitable for low concurrency.

Server side: a multi‑threaded selector with an Acceptor thread and separate thread pools for read and write operations, preventing blocking and improving scalability.

High‑Reliability Distributed Storage Model

Messages are stored in partitioned log files, each partition consisting of multiple LogSegments (index and .log files). Kafka uses sparse indexing to reduce space and speed up lookups.

Reading a specific offset involves locating the correct LogSegment via binary search, then using the index to find the physical file position.

Sequential reads benefit from OS page cache, but excessive partitions increase random I/O during writes, so a moderate number of partitions is recommended.

Replication Mechanism

Each partition has one leader replica and zero or more follower replicas. The ISR (In‑Sync Replicas) set contains replicas that are up‑to‑date with the leader; only ISR members can be elected as new leaders.

HW (High Watermark) marks the offset visible to consumers; LEO (Log End Offset) marks the last offset in the log. Producers can set acks to 0, 1, or -1 to control durability guarantees.

High AvailabilityKafkaReplicationDistributed MessagingConsumer GrouppartitionsTopics
Laravel Tech Community
Written by

Laravel Tech Community

Specializing in Laravel development, we continuously publish fresh content and grow alongside the elegant, stable Laravel framework.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.