Introduction to Apache Kafka: Core Concepts, Architecture, and APIs
This article provides a comprehensive overview of Apache Kafka, covering its fundamental capabilities, typical use cases, core components, key APIs, and essential concepts such as topics, partitions, segments, brokers, producers, and consumers, illustrated with diagrams.
What is Kafka
Kafka is a distributed streaming platform that offers three core capabilities: a publish‑subscribe record stream similar to a message queue, fault‑tolerant storage of those records, and real‑time processing of the streams.
Kafka Applications
Used as a messaging system
Used as a storage system
Used as a stream processor
Kafka can build reliable data pipelines between systems or applications and enable streaming data transmission and response.
Kafka as a Messaging System
When used as a messaging system, Kafka consists of three basic components:
Producer – the client that publishes messages
Broker – the server that receives and stores messages from producers
Consumer – the client that reads messages from brokers
In large systems, many subsystems need to exchange data; a message‑passing system like Kafka simplifies and organizes this interaction.
Kafka runs on one or more servers in one or more data centers as a cluster. The cluster stores messages in logical containers called topics . Each message record contains a key, a value, and a timestamp.
Core APIs
Kafka provides four core APIs:
Producer API – allows applications to send message records to one or more topics
Consumer API – allows applications to subscribe to topics and process the resulting record streams
Streams API – enables applications to act as stream processors, consuming input streams from topics and producing output streams
Connector API – enables building and running connectors that link Kafka topics to external systems such as relational databases
Fundamental Kafka Concepts
Topic
A topic is a logical category that groups related messages, similar to a table in a database or a folder in a file system.
Partition
Each topic is divided into one or more partitions, which are physical logs stored on disk. Messages are appended to partitions, and each partition preserves order.
Note: Because a topic may contain many partitions, global ordering across the entire topic cannot be guaranteed, but ordering is preserved within each individual partition.
Partitions can be distributed across multiple servers, allowing a topic to span several machines for higher performance.
Segment
Partitions are further broken into segments, each of which is a fixed‑size file on disk.
Broker
A Kafka cluster consists of one or more brokers (servers). Brokers receive messages from producers, assign offsets, persist them to disk, and serve consumer read requests. One broker in the cluster acts as the leader for each partition, handling replication and failover.
Producer
The producer publishes messages to a topic. By default, it distributes messages evenly across all partitions of the topic, though it can target specific partitions when needed.
Consumer
The consumer reads messages from one or more topics. For a given topic, a consumer reads messages from a specific partition, ensuring ordered consumption within that partition.
Source: SegmentFault https://segmentfault.com/a/1190000020718980
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.