Big Data 8 min read

Understanding Kafka Exactly-Once Semantics, Idempotence, and Transactions

This article explains Kafka's Exactly-Once Semantics (EOS), the role of idempotence, and how transactional support works, covering EOS semantics, producer id and sequence numbers, configuration properties, and providing Java code examples for initializing, beginning, committing, and aborting transactions.

Big Data Technology Architecture
Big Data Technology Architecture
Big Data Technology Architecture
Understanding Kafka Exactly-Once Semantics, Idempotence, and Transactions

This article introduces the principles behind Kafka's transactional capabilities, starting with Exactly-Once Semantics (EOS) which became available from Kafka 0.11.0.0. Prior versions only supported At-Least-Once and At-Most-Once delivery.

EOS is essential for strict scenarios such as processing financial transactions, where each record must be processed exactly once. While At-Least-Once can be combined with downstream idempotent processing to approximate EOS, this approach has limitations: it requires downstream idempotence, raises the implementation barrier, and does not work for Kafka Streams before version 0.11.0.0.

Kafka Idempotence

Idempotence is a producer‑side feature introduced alongside transactions in Kafka 0.11.0.0. It relies on a unique Producer ID (PID) and a monotonically increasing Sequence Number for each partition. The broker stores the highest sequence number per PID; messages with a sequence number not greater than the stored value are discarded, ensuring no duplicate writes. The relevant configuration is enable.idempotence (default false, must be set to true when using transactions).

Idempotence guarantees Exactly‑Once only for a single producer session on a single partition; it does not provide cross‑partition guarantees.

Kafka Transactions

Transactions extend idempotence by allowing atomic writes to multiple partitions. A transaction either commits all writes or aborts them, providing true EOS across partitions.

The Kafka Transaction API consists of five methods:

/**
 * Initialize transactions. Preconditions:
 * 1. The transactional.id property must be configured.
 * 2. This method ensures any previous transactions with the same transactional.id are completed.
 *    If a previous instance failed mid‑transaction, it will be aborted; if a transaction was in progress,
 *    this call will wait for its completion.
 *    It also retrieves the internal producer id and epoch used for all future transactional messages.
 */
public void initTransactions();

/** Open a new transaction */
public void beginTransaction() throws ProducerFencedException;

/** Send consumer offsets to the transaction */
public void sendOffsetsToTransaction(Map
offsets,
                                     String consumerGroupId) throws ProducerFencedException;

/** Commit the current transaction */
public void commitTransaction() throws ProducerFencedException;

/** Abort the current transaction */
public void abortTransaction() throws ProducerFencedException;

Key configuration properties include:

Set transactional.id on the producer (enables both idempotence and transactions).

Consumers must disable auto‑commit and use isolation.level=read_committed to read only committed data.

Kafka Transaction Example (Java)

Properties props = new Properties();
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("client.id", "ProducerTransactionalExample");
props.put("bootstrap.servers", "localhost:9092");
props.put("transactional.id", "test-transactional");
props.put("acks", "all");

KafkaProducer
producer = new KafkaProducer<>(props);
producer.initTransactions();

try {
    String msg = "matt test";
    producer.beginTransaction();
    producer.send(new ProducerRecord<>(topic, "0", msg));
    producer.send(new ProducerRecord<>(topic, "1", msg));
    producer.send(new ProducerRecord<>(topic, "2", msg));
    producer.commitTransaction();
} catch (ProducerFencedException e1) {
    e1.printStackTrace();
    producer.close();
} catch (KafkaException e2) {
    e2.printStackTrace();
    producer.abortTransaction();
}
producer.close();

The relationship between idempotence and transactions is that transactions require idempotence (the producer must have a transactional.id which implicitly enables enable.idempotence ), but idempotence can be used independently without transactions.

Configuration combinations:

enable.idempotence=true and no transactional.id : only idempotence.

enable.idempotence=true with transactional.id set: both idempotence and transactions.

enable.idempotence=false and no transactional.id : neither feature.

enable.idempotence=false with transactional.id set: error because PID cannot be obtained.

References:

Kafka EOS and transaction implementation: https://www.codercto.com/a/36351.html

Kafka producer transactions and idempotence: http://www.heartthinkdo.com/?p=2040#5

javaStreamingKafkaidempotenceTransactionsexactly-once
Big Data Technology Architecture
Written by

Big Data Technology Architecture

Exploring Open Source Big Data and AI Technologies

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.