Big Data 11 min read

Understanding Kafka Replication: Mechanism, Roles, ISR, and Unclean Leader Election

This article explains Apache Kafka's replication mechanism, detailing its benefits, replica definitions, leader‑follower roles, in‑sync replica (ISR) criteria, and the trade‑offs of unclean leader election, highlighting how these features affect data redundancy, scalability, and consistency in distributed systems.

IT Architects Alliance
IT Architects Alliance
IT Architects Alliance
Understanding Kafka Replication: Mechanism, Roles, ISR, and Unclean Leader Election

Replication, also known as backup, refers to keeping identical copies of data on multiple network‑connected machines in a distributed system. Its primary advantages are providing data redundancy for higher availability and durability, enabling horizontal scalability for increased read throughput, and improving data locality by placing replicas closer to users.

In Kafka, a replica is an append‑only log that stores the same message sequence for a partition across several brokers. Each partition can have multiple replicas, with one elected as the leader and the others as followers.

The leader‑based replication model works as follows: the leader replica handles all client read/write requests, while follower replicas asynchronously pull data from the leader and write it to their own logs. If the leader fails, ZooKeeper triggers a new leader election among the followers.

Because followers do not serve client requests, Kafka cannot provide horizontal read scaling or improved locality, but this design simplifies achieving "read‑your‑writes" consistency and monotonic reads.

Kafka introduces In‑Sync Replicas (ISR), a dynamic set of replicas that are considered synchronized with the leader. A follower joins ISR if it stays within the configured replica.lag.time.max.ms (default 10 seconds) behind the leader; otherwise it is removed until it catches up.

When ISR becomes empty (i.e., the leader is down), Kafka may perform an unclean leader election, selecting a non‑ISR replica as the new leader. Enabling this improves availability but risks data loss, while disabling it preserves consistency at the cost of potential downtime.

Overall, Kafka’s replication mechanism emphasizes high availability and data durability, trading off read scalability and locality, and provides configurable controls (ISR and unclean leader election) to balance consistency versus availability according to application needs.

distributed systemsHigh AvailabilityKafkaReplicationLeader ElectionISR
IT Architects Alliance
Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.