Backend Development 11 min read

Understanding Kafka Idempotent Producer and How to Prevent Message Duplicates

This article explains why message duplication occurs in Kafka, describes the three delivery semantics, and provides practical solutions—including idempotent producers, transactions, and consumer-side idempotence—along with configuration tips and code examples to achieve exactly‑once delivery.

Top Architect
Top Architect
Top Architect
Understanding Kafka Idempotent Producer and How to Prevent Message Duplicates

Message duplication is a common issue in distributed systems, especially when producers retry after failures or consumers re‑process uncommitted offsets after a crash.

The article first outlines three delivery semantics: at most once (possible loss, no duplicates), at least once (no loss, possible duplicates), and exactly once (no loss, no duplicates), using MQTT QoS examples.

To achieve exactly‑once semantics in Kafka, three approaches are presented:

Kafka Idempotent Producer : enable idempotence on the producer side. This requires setting enable.idempotence=true , acks=all , and max.in.flight.requests.per.connection<=5 . The article shows the necessary configuration snippet: Properties props = new Properties(); props.put("enable.idempotence", true); props.put("acks", "all"); props.put("max.in.flight.requests.per.connection", 5);

Kafka Transactions : use transactional producers to overcome the single‑partition limitation of idempotent producers. Example code includes initializing a transaction, sending records, and committing or aborting based on exceptions: producer.initTransactions(); producer.beginTransaction(); producer.send(new ProducerRecord<>("Topic", "Key", "Value")); producer.commitTransaction(); // on error: producer.abortTransaction();

Consumer‑Side Idempotence : store processed message IDs in a database table and use transactions to ensure duplicate inserts are rejected, effectively making the consumer idempotent.

The article also details how Kafka tracks producer state using a globally unique pid and per‑partition sequence numbers, and how the broker determines duplicates by comparing <pid, seqNum> against its cached entries.

Practical notes include:

If acks is set to 0 or 1 , idempotence cannot be guaranteed.

Setting max.in.flight.requests.per.connection greater than 5 will cause a configuration error.

Finally, the article provides a complete Java producer example that reads lines from a file and sends them to a Kafka topic with idempotent settings, and shows how to start the consumer from the command line.

JavaConfigurationKafkaidempotenceproducerMessage Duplication
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.