Big Data 37 min read

Kafka Practical Guide: Concepts, Architecture, Configuration, Monitoring, and Management

This article provides a comprehensive overview of Kafka, covering its basic concepts, architecture, deployment, configuration, monitoring, producer and consumer settings, offset management, high availability, replication, leader election, and practical tips for deployment, tuning, and troubleshooting in production environments.

Architecture Digest

Aug 8, 2019

Kafka Practical Guide: Concepts, Architecture, Configuration, Monitoring, and Management

The article begins with a background on migrating a centralized log monitoring system to Kafka, explaining why Kafka is a suitable choice for data buffering and transport.

It introduces Kafka's core concepts: a broker is a server in the cluster; a topic categorizes messages; a message (record) consists of a header and payload with fields such as CRC32, magic, attributes, timestamp, key, and payload; a partition is an ordered log segment stored in a directory named ${topicName}-${partitionId}; and a log segment is a file segment that may have accompanying index files (.index, .timeindex).

Producer configuration details are covered, highlighting important properties like request.required.acks (acks=0/1/-1), message.send.max.retries, retry.backoff.ms, queue.buffering.max.ms, batch.num.messages, and request.timeout.ms, which control reliability, batching, and latency.

Consumer settings are explained, including group.id, client.id, bootstrap.servers, deserializers, fetch sizes, and offset commit options. Both automatic ( enable.auto.commit=true) and manual commit ( enable.auto.commit=false, commitSync(), commitAsync()) modes are described.

The article discusses ISR (In‑Sync Replicas), replication factors, and leader election, noting that only the leader handles reads and writes while followers replicate data for high availability. It also explains the role of Zookeeper in storing metadata such as broker info, topic configurations, and consumer offsets.

Client examples in Python using confluent_kafka are provided for both producer and consumer, showing how to instantiate the client, produce messages, and poll for consumption.

Deployment and configuration guidance includes key server.properties settings (e.g., broker.id, log.dirs, log.segment.bytes, log.retention.hours, zookeeper.connect) and Zookeeper configuration ( dataDir, clientPort, server list). Startup commands are shown with JMX exposure:

JMX_PORT=8999 bin/kafka-server-start.sh -daemon config/server.properties

For monitoring and management, tools such as Yahoo’s Kafka Manager, Burrow, and Kafka Offset Monitor are introduced, describing their capabilities for cluster inspection, offset lag checking, and topic administration.

Finally, the article outlines common challenges: ordering guarantees limited to a single partition, potential duplicate or missed consumption due to offset handling, limits on the number of topics/partitions, manual rebalancing of partition load, and the fact that follower replicas do not serve read traffic.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Distributed Systems Kafka Message Queue Offset Management

Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.