Solving Kafka Message Duplication with Idempotent Producer, Transactions, and Consumer Idempotence
This article explains why message duplication occurs in Kafka pipelines, describes the three delivery semantics, and provides three practical solutions—Kafka idempotent producer, Kafka transactions, and consumer‑side idempotence—along with configuration tips and code examples.
1. Introduction
Message duplication is a common issue in end‑to‑end pipelines, especially when retries are configured for network fluctuations or consumer failures.
Duplication Scenarios
Producer side: leader partition unavailable (LeaderNotAvailableException), controller failure (NotControllerException), network errors (NetworkException) cause retries.
Consumer side: after a poll the offset is not committed, the process crashes, and the same batch is polled again, leading to duplicate consumption.
Message Delivery Semantics
At‑most‑once – message may be lost but never duplicated.
At‑least‑once – message is never lost but may be duplicated.
Exactly‑once – message is delivered once without loss or duplication.
How to Achieve Exactly‑once
Three approaches are presented:
Kafka idempotent Producer (requires enable.idempotence=true , acks=all , max.in.flight.requests.per.connection≤5 ).
Kafka transactions (set transactional.id and use initTransactions , beginTransaction , commitTransaction ).
Consumer‑side idempotence (store processed message IDs in a table and use transactional writes).
Kafka Idempotent Producer Example
Properties props = new Properties();
props.put("enable.idempotence", true);
props.put("acks", "all");
props.put("max.in.flight.requests.per.connection", 5);
// ... create KafkaProducer and send records ...Explanation of the internal mechanism: each producer obtains a unique PID, the broker tracks a monotonically increasing sequence number per <topic, partition> , and duplicates are detected by comparing (pid, seq) with the broker’s stored state.
Kafka Transaction Example
Properties props = new Properties();
props.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, "true");
props.put(ProducerConfig.ACKS_CONFIG, "all");
props.put(ProducerConfig.TRANSACTIONAL_ID_CONFIG, "my-transactional-id");
KafkaProducer
producer = new KafkaProducer<>(props);
producer.initTransactions();
producer.beginTransaction();
// send records
producer.commitTransaction();The consumer must use isolation.level=read_committed to see only committed transactional records.
Consumer Idempotence Example
Insert the message ID into a deduplication table inside a transaction; if the insert fails, roll back the transaction.
Configuration Experiments
Changing acks to 0 or 1 disables idempotence and throws a configuration exception. Setting max.in.flight.requests.per.connection greater than 5 also causes an error.
2. Practical Steps
Run Zookeeper and Kafka (2.7.1) via Docker, create a topic with 1 replica and 2 partitions, start the producer and consumer scripts, and observe the console output.
When to Use Idempotence
Use it when acks=all is already required; avoid it when performance is prioritized and acks=0 or acks=1 is acceptable.
Feel free to discuss, ask questions, or contact the author for further clarification.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.