Big Data 11 min read

Master Kafka Basics: Topics, Partitions, Producers, and Cluster Architecture

This article explains Kafka's role as a messaging system, covering core concepts such as topics, partitions, producers, consumers, messages, cluster architecture, replicas, consumer groups, controller coordination with Zookeeper, and performance optimizations like sequential writes and zero‑copy networking.

Efficient Ops

Oct 7, 2023

Kafka Basics

Kafka is a distributed messaging system that acts as a buffer and decouples producers from consumers, storing data on disk rather than in memory.

Message System Role

It functions like a warehouse, providing caching and decoupling capabilities for large‑scale log processing scenarios.

1. Topic

A topic is analogous to a table in a relational database; each topic holds a stream of messages.

To consume data from a specific source, you simply listen to the corresponding topic (e.g., TopicA for China Mobile).

2. Partition

Each topic is divided into multiple partitions, which are stored as directories on different brokers. Partitions improve performance by allowing parallel processing across multiple threads.

Partitions are similar to HBase regions: the topic is a logical concept, while partitions are the physical storage units distributed across servers.

Partitions can become single points of failure, so replicas are configured.

Partition numbering starts at 0.

3. Producer

Producers send messages to Kafka.

4. Consumer

Consumers read messages from Kafka.

5. Message

The data processed within Kafka is called a message.

Kafka Cluster Architecture

A topic can have multiple partitions distributed across different brokers. Early Kafka versions (<0.8) lacked replication, leading to data loss on broker failures.

Replica

Each partition can have multiple replicas for fault tolerance. One replica acts as the leader, while others are followers that synchronize from the leader.

Consumer Group

Consumers belong to a consumer group identified by group.id. Within a group, each partition is consumed by only one consumer, preventing duplicate processing. conf.setProperty("group.id","tellYourDream") Different groups can consume the same topic independently.

consumerA:
    group.id = a
consumerB:
    group.id = a
consumerC:
    group.id = b
consumerD:
    group.id = b

Controller

The controller is the master node that coordinates the cluster together with Zookeeper.

Kafka and Zookeeper Coordination

All brokers register themselves in Zookeeper at startup, which elects a controller. The controller watches Zookeeper directories (e.g., /brokers/) to track broker registrations and manage metadata.

Performance Highlights

Sequential Writes

Kafka writes data sequentially to disk, achieving near‑memory speeds because disk seeks are minimized.

Zero‑Copy

Kafka uses Linux sendFile to transfer data directly from disk to the network socket, eliminating extra memory copies and context switches.

Log Segmentation

Each partition’s log file is limited to 1 GB to simplify loading segments into memory.

00000000000000000000.index
00000000000000000000.log
00000000000000000000.timeindex

00000000000005367851.index
00000000000005367851.log
00000000000005367851.timeindex

00000000000009936472.index
00000000000009936472.log
00000000000009936472.timeindex

Network Design

Clients connect to an Acceptor, which forwards requests to a pool of processor threads. Processors handle reads and writes, and a thread pool processes responses, forming a three‑layer reactor model.

Conclusion

This article introduced Kafka’s core concepts, roles, and design considerations. Future updates will cover cluster deployment and deeper performance tuning.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Distributed Systems Big Data Kafka Message Queue consumer-group Partition Topic

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.