Backend Development 5 min read

Preventing Duplicate Message Consumption in Kafka

This article explains why duplicate message consumption occurs in Kafka, outlines the underlying confirmation and consumer failure issues, and presents idempotent design strategies such as deduplication tables and code examples to ensure reliable processing.

Mike Chen's Internet Architecture

Jul 3, 2024

Preventing Duplicate Message Consumption in Kafka

Duplicate consumption means a consumer receives and processes the same message multiple times, which can lead to erroneous operations such as charging a user's account more than once.

Causes of Duplicate Consumption

The main reasons are:

Message acknowledgment problems: If the consumer does not correctly send an acknowledgment after successful processing, the queue assumes the message was not handled and resends it.

Consumer failures: Crashes, network interruptions, or other faults that prevent the acknowledgment from being sent.

Message‑queue issues: Certain queues may resend unacknowledged messages after a restart or recovery.

Solution Overview

The core solution is to make the consumer logic idempotent so that processing the same request repeatedly yields the same result.

Typical approaches include using unique identifiers, converting insert operations to updates, and designing explicit idempotent operations.

One concrete method is to maintain a deduplication table that records identifiers of already processed messages.

CREATE TABLE orders (
    id INT AUTO_INCREMENT PRIMARY KEY,
    order_id VARCHAR(32) NOT NULL,
    status VARCHAR(50) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE processed_orders (
    order_id VARCHAR(32) PRIMARY KEY,
    processed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Typical workflow:

Create a deduplication table to store processed operation IDs (e.g., message ID, order ID).

Check the deduplication table before handling an operation; if the ID exists, skip processing.

Process the operation only when the ID is absent.

Record the operation by inserting the ID into the deduplication table after successful processing.

This approach emphasizes data uniqueness checks as the fundamental safeguard against duplicate processing.

In high‑concurrency scenarios, additional measures such as row‑level locking or optimistic concurrency control may be required to avoid race conditions when multiple requests query and insert into the deduplication table simultaneously.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Backend Development Kafka Message Queue Duplicate Consumption

Written by

Mike Chen's Internet Architecture

Over ten years of BAT architecture experience, shared generously!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.