Kafka Core Concepts, Architecture, Performance Optimization, and Operational Practices
This article explains Kafka's fundamental principles, cluster architecture, data performance techniques such as zero‑copy and log segmentation, resource planning for high‑throughput scenarios, and provides practical operational commands and custom partitioning examples for reliable, high‑availability deployments.
Core Value of Message Queues
Decoupling, asynchronous processing, and traffic control are highlighted using an e‑commerce flash‑sale example, showing how Kafka can split business logic and smooth request spikes.
Kafka Core Concepts
Producers generate data, consumers read it, topics are logical groups, and partitions store data across brokers. Consumer groups ensure that each partition is consumed by only one member of the group.
Cluster Architecture
A Kafka broker is a server; topics consist of partitions, each stored as a directory on disk. Replication provides high availability, with a leader partition handling reads/writes and follower partitions synchronizing via the ISR list.
Data Performance
Writes are sequential appends to OS cache and then flushed to disk, achieving near‑memory speeds. Zero‑copy (Linux sendfile) moves data from OS cache directly to the network card, reducing CPU and context switches.
Zero‑Copy Data Flow
Consumer sends request to Kafka.
Kafka reads from OS cache (or disk if cache miss).
Data is copied to the consumer process.
Kafka sends data to socket cache.
Socket cache transmits via NIC.
Log Segmentation and Indexing
Each partition creates .log files (default 1 GB) and a sparse index written every 4 KB. Binary search on the index quickly locates a message’s physical position.
High‑Concurrency Network Design (NIO)
Kafka uses a multi‑selector, multi‑thread, queue‑based NIO model to achieve high throughput. Diagrams illustrate three Reactor patterns and the overall three‑layer architecture.
Redundant Replicas for High Availability
Partitions have configurable replication factors; the leader handles all I/O while followers stay in sync. The ISR list tracks in‑sync replicas, and the controller monitors leader health.
Resource Planning for a 1 Billion‑Request Scenario
Assuming 50 KB per request, the system must handle 5.5 × 10⁴ QPS, store ~276 TB for three days with two replicas, and use five physical servers (≈11 × 7 TB disks each). Memory sizing targets 60 GB for OS cache plus 10 GB JVM, with a recommendation of 64 GB (128 GB ideal). CPU should provide at least 16 cores (32 cores preferred), and network interfaces should be 10 Gbps for peak traffic.
Cluster Planning Summary
Deploy five servers with SAS disks, 64 GB RAM, 16‑32 CPU cores, and 10 Gbps NICs. Use Zookeeper for metadata, place Kafka and ZK on the same machines if resources are limited, and configure replication, partitions, and broker IDs accordingly.
Operational Tools and Commands
KafkaManager provides a web UI. Common CLI tasks include:
kafka-topics.sh --create --zookeeper hadoop1:2181,hadoop2:2181,hadoop3:2181 --replication-factor 1 --partitions 1 --topic test6
kafka-topics.sh --alter --zookeeper hadoop1:2181,hadoop2:2181,hadoop3:2181 --topic test6 --partitions 3To increase replication factor:
{"version":1,"partitions":[{"topic":"test6","partition":0,"replicas":[0,1,2]},{"topic":"test6","partition":1,"replicas":[0,1,2]},{"topic":"test6","partition":2,"replicas":[0,1,2]}]} kafka-reassign-partitions.sh --zookeeper hadoop1:2181,hadoop2:2181,hadoop3:2181 --reassignment-json-file test.json --executeFor load‑balancing topics:
{"topics":[{"topic":"test01"},{"topic":"test02"}],"version":1} kafka-reassign-partitions.sh --zookeeper hadoop1:2181,hadoop2:2181,hadoop3:2181 --topics-to-move-json-file topics-to-move.json --broker-list "5,6" --generate kafka-reassign-partitions.sh --zookeeper hadoop01:2181,hadoop02:2181,hadoop03:2181 --reassignment-json-file expand-cluster-reassignment.json --execute
kafka-reassign-partitions.sh --zookeeper hadoop01:2181,hadoop02:2181,hadoop03:2181 --reassignment-json-file expand-cluster-reassignment.json --verifyCustom Partitioner Example
public class HotDataPartitioner implements Partitioner {
private Random random;
@Override
public void configure(Map configs) { random = new Random(); }
@Override
public int partition(String topic, Object keyObj, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {
String key = (String) keyObj;
List
partitionInfoList = cluster.availablePartitionsForTopic(topic);
int partitionCount = partitionInfoList.size();
int hotDataPartition = partitionCount - 1;
return !key.contains("hot_data") ? random.nextInt(partitionCount - 1) : hotDataPartition;
}
}Configure with:
props.put("partitioner.class", "com.zhss.HotDataPartitioner");Producer Throughput Tuning
Key parameters: buffer.memory (default 32 MB), compression.type (e.g., lz4), batch.size (default 16 KB), linger.ms (default 0, often set to 100 ms), and max.in.flight.requests.per.connection (set to 1 to avoid reordering).
Error Handling and Retry Strategies
Common exceptions: LeaderNotAvailableException , NotControllerException , NetworkException . Use retries and retry.backoff.ms . Beware of duplicate messages and ordering issues; set max.in.flight.requests.per.connection=1 if needed.
ACK Configuration
acks=0 (fire‑and‑forget), acks=1 (leader ack), acks=-1 (all in‑sync replicas). Combine with min.insync.replicas to guarantee durability.
Consumer Group Mechanics
Group coordinators manage heartbeats, rebalance, and offset storage. Offsets are now stored in the internal __consumer_offsets topic (default 50 partitions) instead of Zookeeper.
Offset Management Tools
java -cp KafkaOffsetMonitor-assembly-0.3.0-SNAPSHOT.jar com.quantifind.kafka.offsetapp.OffsetGetterWeb --offsetStorage kafka --zk hadoop1:2181 --port 9004 --refresh 15.seconds --retain 2.daysRebalance Strategies
Range, round‑robin, and sticky. Sticky tries to keep existing partition assignments while balancing load.
Broker Internals (LEO & HW)
Log End Offset (LEO) tracks the next offset to write; High Watermark (HW) marks the highest offset replicated to all in‑sync replicas and visible to consumers.
Controller Responsibilities
Manages broker registration, topic creation, partition reassignment, and leader election via Zookeeper paths such as /controller/id , /broker/ids/ , and /admin/reassign_partitions .
Delayed Operations and Time Wheel
Kafka uses a custom time‑wheel scheduler for delayed tasks (e.g., producer acks timeout, follower fetch delays) achieving O(1) insertion and removal.
IT Architects Alliance
Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.