Big Data 26 min read

Comprehensive Introduction to Apache Kafka: Concepts, Architecture, Installation, and Usage

This article provides a comprehensive guide to Apache Kafka, covering its core concepts, architecture, key APIs, topics and partitions, deployment steps, multi‑broker clustering, fault tolerance, and data integration using Kafka Connect, with detailed command‑line examples.

Top Architect
Top Architect
Top Architect
Comprehensive Introduction to Apache Kafka: Concepts, Architecture, Installation, and Usage

Kafka Overview

Kafka is a distributed streaming platform that provides publish/subscribe messaging, fault‑tolerant storage, and real‑time stream processing capabilities.

Core Concepts

Publish and Subscribe – messages are written to and read from topics.

Durable Storage – records are persisted in an append‑only log.

Stream Processing – consumers can process records as they arrive.

Key Terminology

Topic : a logical category of records, possibly spanning multiple partitions.

Partition : an ordered, immutable sequence of records stored as a log file.

Broker : a server that hosts partitions and serves client requests.

Leader / Follower : each partition has one leader handling reads/writes; followers replicate the leader.

Core APIs

Producer API : publish records to topics.

Consumer API : subscribe to topics and read records.

Streams API : build stream processing applications.

Connector API : integrate Kafka with external systems (e.g., databases).

Installation

Download the desired version from kafka.apache.org and extract it.

[root@along ~]# wget http://mirrors.shu.edu.cn/apache/kafka/2.1.0/kafka_2.11-2.1.0.tgz
[root@along ~]# tar -C /data/ -xvf kafka_2.11-2.1.0.tgz
[root@along ~]# cd /data/kafka_2.11-2.1.0/

Zookeeper Configuration

Kafka requires Zookeeper for cluster coordination.

[root@along ~]# yum -y install java-1.8.0

Modify config/zookeeper.properties as needed (e.g., dataDir, clientPort).

Kafka Broker Configuration

Edit config/server.properties to set broker ID, listeners, log directories, and Zookeeper connection.

broker.id=0
listeners=PLAINTEXT://localhost:9092
log.dirs=/tmp/kafka-logs
zookeeper.connect=localhost:2181

Starting Services

Start Zookeeper and then Kafka:

# nohup zookeeper-server-start.sh config/zookeeper.properties &
# service kafka start

Basic Operations

Create a topic: # kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic along

Send messages with the console producer: # kafka-console-producer.sh --broker-list localhost:9092 --topic along > This is a message

Consume messages with the console consumer: # kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic along --from-beginning

Multi‑Broker Cluster

Copy server.properties to create server-1.properties and server-2.properties , change broker.id , listeners , and log.dirs , then start each broker.

# nohup kafka-server-start.sh config/server-1.properties &
# nohup kafka-server-start.sh config/server-2.properties &

Verify the cluster with kafka-topics.sh --describe and test fault tolerance by killing a broker; the leader will automatically move to another replica.

Kafka Connect

Kafka Connect enables importing/exporting data without custom code. Run in standalone mode with configuration files:

# connect-standalone.sh config/connect-standalone.properties config/connect-file-source.properties config/connect-file-sink.properties

Example: source connector reads lines from test.txt into topic connect-test ; sink connector writes the topic back to test.sink.txt .

# echo -e "foo\nbar" > test.txt
# cat test.sink.txt
foo
bar

Consume the topic directly to see the JSON payloads.

# kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic connect-test --from-beginning
{"schema":{"type":"string","optional":false},"payload":"foo"}
{"schema":{"type":"string","optional":false},"payload":"bar"}

The article also contains promotional text unrelated to technical content, which has been omitted from the summary.

Big DataKafkaConsumerInstallationproducerDistributed StreamingKafka Connect
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.