Introduction to Apache Kafka: Core Concepts, Message Delivery, Partition Storage, and Consumption
This article introduces Apache Kafka as a distributed streaming platform, explaining its three core capabilities, key concepts such as producers, topics, brokers, partitions and consumers, and detailing how messages are delivered, stored in partitions, and consumed by consumer groups.
Kafka, as a distributed streaming platform, is increasingly used in the big‑data field; this article provides an overview of Kafka’s essential features.
Key Capabilities
Kafka offers three major abilities: publishing and subscribing to message streams (similar to a message queue), fault‑tolerant storage of streams, and real‑time processing of streams.
It is typically applied to build real‑time data pipelines for reliable data transfer between systems, and to develop applications that transform or react to streaming data.
Related Concepts
Producer : publishes messages to Kafka.
Topic : logical classification of messages; every message belongs to a specific topic.
Broker : a Kafka server; a cluster consists of multiple brokers.
Partition : each topic is divided into ordered, immutable partitions; each partition can have multiple replicated copies for fault tolerance.
Consumer : pulls messages from Kafka and belongs to a consumer group.
Message Delivery
Each message consists of a key, value, and timestamp.
Messages are stored in partitions; placement follows three rules: (1) producer explicitly specifies a partition, (2) if no partition and no key, messages are round‑robin distributed across partitions, (3) if a key is present, the key determines the partition.
The acknowledgment setting (acks) controls producer‑side reliability: acks=0 (no wait), acks=1 (wait for leader), acks=-1 or all (wait for all in‑sync replicas).
Leaders and followers are broker roles; each partition has one leader handling reads/writes, while followers replicate the leader’s data. Synchronized brokers are listed in the ISR (In‑Sync Replicas) set.
Partition Storage
Topics are logical; partitions are the actual storage units. Each partition is an ordered, immutable sequence of records, appended sequentially, and each record has a unique offset.
Within a single partition, order is guaranteed, but overall topic order is not.
Messages persist until a configured retention period expires, independent of consumption.
Message Consumption
Consumers belong to a consumer group; each partition is consumed by only one consumer in the group, though a consumer may handle multiple partitions.
If there are more consumers than partitions, the extra consumers remain idle.
Consumers actively pull messages, allowing them to control offset and replay historical data.
Older Kafka versions offered low‑level and high‑level consumer APIs; the high‑level API handled partition assignment and rebalancing automatically. Newer versions unify the API while still permitting custom or automatic partition assignment.
Conclusion
Kafka provides high throughput, low latency, scalability, persistence, fault tolerance, and high concurrency, making it a powerful foundation for modern data streaming architectures.
System Architect Go
Programming, architecture, application development, message queues, middleware, databases, containerization, big data, image processing, machine learning, AI, personal growth.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.