Big Data 20 min read

Master Kafka Interview Questions: Architecture, Partitioning, and Reliability Explained

This article provides a comprehensive overview of Kafka, covering its core architecture, message queue models, communication process, partition selection, consumer groups, rebalancing strategies, partition assignment algorithms, reliability guarantees, replica synchronization, and reasons for removing Zookeeper in newer versions.

Sanyou's Java Diary
Sanyou's Java Diary
Sanyou's Java Diary
Master Kafka Interview Questions: Architecture, Partitioning, and Reliability Explained

Hello everyone, I am Sanyou.

Today we discuss common Kafka interview questions.

What is your understanding of Kafka?

Kafka is a streaming data platform that also provides messaging capabilities; it is often used as a message queue.

It can be viewed in three layers:

First layer: Zookeeper – the registry that manages metadata and coordinates the cluster.

Second layer: core Kafka concepts:

record : a message

topic : a category for messages

producer : sends messages

consumer : consumes messages

broker : a Kafka server

partition : a shard of a topic that enables load‑balancing and horizontal scaling

Leader/Follower : replicas of a partition; the Leader handles reads/writes, Followers sync from the Leader

offset : the sequential position of a message within a partition

Consumer group : a set of consumers where only one consumer in the group reads from a given partition

Coordinator : assigns partitions to consumers and handles rebalancing

Controller : a broker that manages the whole Kafka cluster, handling leader elections and topic management

Third layer: the storage layer that persists Kafka data as logs on disk.

Do you know the two message‑queue models and how Kafka supports both?

Traditional message queues support:

Point‑to‑point – a message is consumed by a single consumer and then deleted.

Publish‑subscribe – a message can be consumed by all consumers.

Kafka supports both models through Consumer Groups. If all consumers belong to the same group, the model behaves like point‑to‑point; if each consumer uses a separate group, it behaves like publish‑subscribe.

What is the Kafka communication process?

When a broker starts, it registers its ID in Zookeeper under brokers/ids and watches for changes.

A producer specifies bootstrap.servers , creates TCP connections to the listed brokers, and obtains metadata.

After connecting to any broker, it requests metadata (topics, partitions, replicas, leaders).

It then creates TCP connections to all brokers.

Message sending begins.

A consumer also specifies bootstrap.servers , connects to a broker, and discovers the Coordinator broker.

The consumer connects to the coordinator to obtain metadata.

It connects to the Leader broker of each partition.

Finally, it starts consuming messages.

How does Kafka choose a partition when sending a message?

Two main strategies:

Round‑robin – messages are sent to partitions in order.

Random – messages are sent to a random partition.

If a key is provided, Kafka hashes the key and mods by the number of partitions, ensuring that messages with the same key always go to the same partition, which guarantees ordering for that key.

Without a key, the default round‑robin load‑balancing is used.

Custom partitioning can be implemented by providing a Partitioner implementation that overrides configure and partition .

Why is partitioning important?

Without partitions, all data would reside on a single node, limiting scalability. Partitioning distributes data across multiple nodes, providing load balancing, horizontal scalability, and higher throughput for both producers and consumers, while also enabling replication for high availability.

Explain consumer groups and rebalancing.

Consumers in a group should ideally match the number of partitions. If there are fewer consumers than partitions, some consumers handle multiple partitions; if more, some consumers are idle.

Rebalancing occurs when the membership of a consumer group changes or when the number of topics or partitions changes.

Old versions used Zookeeper watchers; newer versions use the coordinator. The process involves:

Each new consumer sends a JoinGroup request; the first becomes the group leader.

The leader computes the partition assignment and sends a SyncGroup request to the coordinator.

Other members also send SyncGroup and receive their assignments.

What are the partition assignment strategies?

Kafka provides three strategies:

Range – the default; partitions are sorted and assigned in order, which can lead to imbalance when a consumer group subscribes to multiple topics.

RoundRobin – partitions are assigned in a round‑robin fashion across all topics, avoiding the imbalance of the Range strategy.

Sticky – aims to keep partition assignments stable across rebalances, reducing connection churn.

How to ensure message reliability?

Reliability is addressed from three angles:

Producer message loss – use asynchronous sends with callbacks and configure acks=all , retries=N (large value).

Kafka internal loss – increase replication.factor , set min.insync.replicas > 1, and disable unclean.leader.election.enable .

Consumer message loss – disable automatic offset commits ( enable.auto.commit=false ) and commit offsets manually; optionally set auto.offset.reset=earliest to avoid loss.

How do replicas synchronize?

All replicas are called AR (Assigned Replicas) ; the in‑sync replicas are ISR (InSyncReplicas) . ISR is maintained using replica.lag.time.max.ms (default 10 s).

Two key concepts:

HW (High Watermark) – the offset up to which all ISR have replicated; messages below HW are visible to consumers.

LEO (Log End Offset) – the offset of the next message to be written.

The synchronization process updates HW based on the smallest LEO among ISR, ensuring consistency.

Why did newer Kafka versions drop Zookeeper?

Two main reasons: operational complexity – managing an additional Zookeeper cluster adds cost and complexity; and performance – Zookeeper is not suited for high‑frequency metadata updates, leading to latency and scalability issues.

Why is Kafka fast?

Three reasons:

Sequential I/O – Kafka appends messages to logs, enabling fast sequential disk writes.

PageCache and zero‑copy – writes use memory‑mapped files (mmap) and reads use sendfile , reducing CPU overhead.

Batching and compression – producers batch messages and both producers and brokers compress data, reducing network and storage costs.

StreamingKafkaReliabilityInterviewConsumer Grouppartitioning
Sanyou's Java Diary
Written by

Sanyou's Java Diary

Passionate about technology, though not great at solving problems; eager to share, never tire of learning!

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.