Big Data 21 min read

Kafka Overview: Architecture, Core Concepts, and Comparison with Other Message Queues

This article provides a comprehensive overview of Kafka, covering its background, design goals, architecture, key terminology, message routing, consumer groups, delivery guarantees, and a comparison with other popular message queue systems such as RabbitMQ, Redis, ZeroMQ, and ActiveMQ.

Qunar Tech Salon
Qunar Tech Salon
Qunar Tech Salon
Kafka Overview: Architecture, Core Concepts, and Comparison with Other Message Queues

Background

Kafka is a distributed publish/subscribe messaging system originally developed at LinkedIn for activity streams and operational data pipelines. It is now widely used by many companies as a core data pipeline and messaging platform.

Activity stream data (page views, content accesses, searches) and operational data (CPU, I/O, request latency, logs) are typically logged to files and periodically aggregated. Modern web services require more sophisticated infrastructure to handle these workloads.

Kafka Introduction

Kafka is designed to provide O(1) message persistence even for terabytes of data, high throughput (over 100K messages/second on commodity hardware), partitioned messaging with ordered delivery per partition, support for both offline and real‑time processing, and horizontal scalability.

Why Use a Message System?

Decoupling: A message queue introduces an implicit data‑driven interface that allows producers and consumers to evolve independently as long as they adhere to the same contract.

Redundancy: Messages are persisted until explicitly consumed, preventing data loss even when processing fails.

Scalability: Increasing message production or consumption rates only requires adding more producers or consumers; no code changes or parameter tuning are needed.

Flexibility & Burst Handling: Queues absorb traffic spikes, protecting critical components from overload.

Recoverability: Failure of a single consumer does not affect the whole system; unprocessed messages remain in the queue for later processing.

Ordering Guarantees: Kafka guarantees order within each partition.

Buffering: Queues act as buffers, allowing faster producers to write while slower consumers read at their own pace.

Asynchronous Communication: Producers can fire‑and‑forget messages, letting consumers process them later.

Common Message Queue Comparison

RabbitMQ: Erlang‑based, supports many protocols (AMQP, XMPP, SMTP, STOMP), heavyweight, broker‑centric, good for routing, load‑balancing, and persistence.

Redis: Key‑value NoSQL store with lightweight MQ capabilities; excels at small payloads (<10 KB) for enqueue/dequeue performance.

ZeroMQ: Fast, broker‑less library offering advanced patterns; non‑persistent, suitable for high‑throughput scenarios but requires more custom wiring.

ActiveMQ: Apache project offering both broker and peer‑to‑peer models; relatively lightweight.

Kafka / Jafka: Apache project, high‑performance, O(1) persistence, high throughput, fully distributed, integrates with Hadoop for parallel loading, and supports both offline and real‑time processing.

Kafka Architecture

Terminology

Broker: A server in a Kafka cluster.

Topic: Logical category of messages; physically stored across one or more brokers.

Partition: A physical slice of a topic; each partition is a ordered log.

Producer: Publishes messages to brokers.

Consumer: Reads messages from brokers.

Consumer Group: A set of consumers that share a group name; each partition is consumed by only one member of the group.

Kafka Topology

A typical cluster contains multiple producers (e.g., page‑view emitters, server logs), several brokers, consumer groups, and a Zookeeper ensemble for configuration, leader election, and rebalancing.

Topic & Partition

Logically a topic behaves like a queue; physically it is split into multiple partitions, each stored in its own directory with log segment files and index files. Each message has a 64‑bit offset and is stored as a log entry consisting of a magic byte, CRC, and payload.

Kafka retains all messages (subject to time‑ or size‑based retention policies) rather than deleting consumed messages, enabling replay and simplifying consumer state management.

Producer Message Routing

Producers select a partition based on a configurable partitioner (e.g., kafka.producer.Partitioner ). The default can be overridden; a common example uses the message key modulo the number of partitions.

Configuration example (default partitions): $KAFKA_HOME/config/server.properties – set num.partitions .

Consumer Group

Within a consumer group, each partition is consumed by only one consumer, but the same topic can be consumed by multiple groups simultaneously, enabling both broadcast and unicast semantics.

Example: a topic with three partitions, one consumer in group 1 receives all messages, while three consumers in group 2 each receive a distinct partition.

Push vs. Pull

Kafka follows the pull model: producers push messages to brokers, consumers pull messages from brokers. Pull allows consumers to match their processing rate, avoiding overload that can occur with push‑only systems.

Kafka Delivery Guarantees

Three delivery semantics are supported:

At most once: Messages may be lost but are never duplicated.

At least once: No loss, but duplicates may occur.

Exactly once: Each message is delivered once and only once (requires external coordination; not fully implemented in older versions).

Producers can achieve at‑most‑once by asynchronous sends; at‑least‑once is the default. Consumers commit offsets to Zookeeper; the commit point determines whether a processed message may be re‑delivered after a crash.

Author Bio

Jason Guo (郭俊): Master's graduate, works on big‑data platform development, proficient with Kafka, Storm, and other distributed streaming technologies. Contact: WeChat habren, Sina Weibo 郭俊_Jason, blog http://www.jasongj.com .

distributed systemsbig datastreamingKafkaMessage QueueConsumerproducer
Qunar Tech Salon
Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.