Big Data 39 min read

Kafka High‑Reliability Architecture, Storage Mechanisms, and Performance Benchmark

This article explains Kafka's distributed architecture, its topic‑partition storage model, replication and ISR mechanisms, leader election, delivery guarantees, configuration for high reliability, and presents extensive benchmark results showing how replication factor, acks settings, and partition count affect throughput and latency.

Architecture Digest
Architecture Digest
Architecture Digest
Kafka High‑Reliability Architecture, Storage Mechanisms, and Performance Benchmark

1. Overview

Kafka, originally developed by LinkedIn and now part of Apache, is a Scala‑based distributed messaging system known for horizontal scalability and high throughput. Compared with traditional messaging systems, Kafka is designed as a distributed system, provides high publish/subscribe throughput, supports multiple subscribers with automatic consumer rebalancing, and persists messages to disk for batch consumption such as ETL and real‑time applications.

2. Kafka Architecture

A typical Kafka deployment consists of multiple producers, brokers, consumer groups, and a ZooKeeper ensemble. Producers push messages to brokers, consumers pull messages from brokers, and ZooKeeper manages cluster metadata, leader election, and consumer‑group rebalancing.

Name

Explanation

Broker

A Kafka node that stores messages; one or more brokers form a Kafka cluster.

Topic

Logical category for messages; each message must specify a topic.

Producer

Client that sends messages to a broker.

Consumer

Client that reads messages from a broker.

ConsumerGroup

A set of consumers sharing a subscription; only one consumer in a group processes a given message.

Partition

Physical subdivision of a topic; each partition is an ordered log.

2.1 Topic & Partition

Each topic is split into multiple partitions, which are append‑only log files. Messages are appended to the tail of a partition and identified by a long offset. Proper partitioning distributes load evenly across brokers, improving horizontal scalability.

# The default number of log partitions per topic. More partitions allow greater parallelism for consumption, but also increase the number of files across brokers.
num.partitions=3

3. High‑Reliability Storage Analysis

3.1 Kafka File Storage Mechanism

Messages are stored per‑partition in a directory structure under log.dirs . Each partition contains multiple segment files; each segment consists of an .index file (metadata) and a .log file (actual messages). Offsets are used to locate messages within segments.

00000000000000000000.index
00000000000000000000.log
00000000000000170410.index
00000000000000170410.log
00000000000000239430.index
00000000000000239430.log

3.2 Replication Principle and Synchronization

Each partition has N replicas (default N≥1). One replica is the leader; the others are followers that replicate the leader's log. The leader writes messages and updates the Log End Offset (LEO). The High Watermark (HW) is the smallest LEO among ISR members and determines the point up to which consumers can read.

3.3 In‑Sync Replicas (ISR)

ISR is the subset of replicas that are fully caught up with the leader. Parameters such as replica.lag.time.max.ms control when a follower is removed from ISR. The size of ISR influences both reliability and throughput.

3.4 Data Reliability and Persistence Guarantees

Producers control reliability via request.required.acks :

1 (default): Ack after leader writes; data may be lost if leader crashes.

0: No ack; highest throughput, lowest reliability.

-1 (all): Ack after all ISR replicas write; highest reliability.

When acks=-1 , min.insync.replicas should be set (e.g., ≥2) to ensure a minimum number of replicas acknowledge writes.

3.5 High Watermark (HW) Details

HW is the minimum LEO among ISR members. It prevents data loss when a leader fails and a new leader is elected, by ensuring the new leader’s log is truncated to HW before serving reads.

3.6 Leader Election

Kafka uses a custom leader election algorithm similar to Microsoft’s PacificA. Only replicas in ISR are eligible to become leader (unless unclean.leader.election.enable=true ), balancing availability and consistency.

3.7 Producer Sending Modes

The producer.type parameter selects sync (default) or async sending. Async mode batches messages for higher throughput but increases the risk of data loss. Related async parameters include queue.buffering.max.ms , queue.buffering.max.messages , queue.enqueue.timeout.ms , and batch.num.messages .

4. High‑Reliability Usage Analysis

4.1 Delivery Guarantees

Kafka can provide three guarantees:

At‑most‑once: messages may be lost but never duplicated.

At‑least‑once: messages are never lost but may be duplicated.

Exactly‑once: each message is processed exactly once (requires additional deduplication logic).

4.2 Message Deduplication

Kafka itself does not provide built‑in deduplication. Applications can use globally unique identifiers (GUID) or external systems like Redis to achieve idempotence.

4.3 High‑Reliability Configuration Recommendations

Topic: replication.factor≥3 , 2≤min.insync.replicas≤replication.factor

Broker: set unclean.leader.election.enable=false

Producer: request.required.acks=-1 and producer.type=sync

5. Benchmark

5.1 Test Environment

Four Kafka brokers (24 CPU cores, 62 GB RAM, 4 Gbps network, 1089 GB disk) running Kafka 0.10.1.0 with JVM options -Xmx8G -Xms8G … . Client machine: 24 CPU cores, 3 GB RAM, 1 Gbps network.

5.2 Scenario 1 – Impact of Replicas, min.insync.replicas , and acks on TPS

Single producer, sync mode, 1 KB messages, 12 partitions. Varying replicas (1/2/4), min.insync.replicas (1/2/4), and acks (-1/1/0). Results show:

TPS order: acks=0 > acks=1 > acks=-1 .

More replicas reduce TPS.

min.insync.replicas does not affect TPS when acks=-1 .

5.3 Scenario 2 – Fixed 1 Partition, Vary Replicas and min.insync.replicas

With acks=-1 , increasing replicas lowers TPS slightly; min.insync.replicas has negligible impact.

5.4 Scenario 3 – Fixed 1 Partition, Vary Replicas and acks

TPS decreases as replicas increase and as acks moves from 0 to 1 to -1.

5.5 Scenario 4 – Varying Partition Count

With 2 replicas, min.insync.replicas=2 , acks=-1 , increasing partitions improves TPS up to a point, after which TPS plateaus or slightly drops.

5.6 Scenario 5 – Broker Failures

Killing two brokers while producing with acks=-1 and min.insync.replicas=2 reduces TPS but continues operation; killing three brokers stops production until enough brokers recover. High retries leads to duplicate persisted messages.

5.7 Scenario 6 – Latency Measurements

With 12 partitions, 4 replicas, acks=-1 , producer latency avg ≈ 1.7 ms (max ≈ 157 ms); consumer latency avg ≈ 1.6 ms (max ≈ 288 ms).

5.8 Summary of Findings

When acks=-1 , TPS is limited by the number of ISR replicas; more replicas → lower TPS.

acks=0 yields the highest TPS, followed by acks=1 , then acks=-1 .

min.insync.replicas does not affect TPS.

Increasing partition count improves TPS up to a saturation point.

With acks=-1 and min.insync.replicas≥1 , all successfully acknowledged messages are safely persisted.

Author Information

The article is authored by the VMS (Message Middleware) team of Vipshop, part of the infrastructure department, focusing on research of messaging middleware such as RabbitMQ and Kafka.

distributed systemsKafkaPerformance BenchmarkReplicationhigh reliability
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.