Backend Development 13 min read

Ensuring Zero Message Loss in MQ Systems – Interview Guide and Best Practices

This article explains how to guarantee that messages are never lost when using MQ middleware such as Kafka, RabbitMQ or RocketMQ, covering the three lifecycle stages, detection methods, design‑for‑failure principles, idempotent consumption, handling message backlog, and practical interview answers.

Top Architect
Top Architect
Top Architect
Ensuring Zero Message Loss in MQ Systems – Interview Guide and Best Practices

Case Background

In a typical e‑commerce scenario (e.g., JD.com), a user may use loyalty points (京豆) to offset part of the purchase amount. The transaction service sends a message like "deduct X points" to an MQ, and the points service consumes the message to perform the actual deduction.

This flow raises the question of how to ensure that no messages are lost during the process.

Case Analysis

Introducing an MQ provides system decoupling and traffic control, which improves high availability and performance. However, it also brings consistency challenges, especially the risk of message loss and duplicate consumption.

System decoupling: MQ isolates upstream and downstream services, allowing independent evolution and graceful degradation.

Traffic shaping: MQ can smooth burst traffic (e.g., flash sales) by buffering requests.

Nevertheless, the three potential problem areas are:

How to detect message loss?

Which stages may cause loss?

How to guarantee no loss?

Case Solution

The message lifecycle can be divided into three stages:

Message Production Stage: As long as the producer receives an ACK from the broker, the message is considered successfully sent.

Message Storage Stage: The broker ensures durability by replicating the message to at least two nodes before acknowledging.

Message Consumption Stage: The consumer should acknowledge only after business logic succeeds, ensuring no loss even if processing fails.

Because failures are inevitable, we adopt the Design for Failure principle and add a monitoring mechanism to check for lost messages.

One practical detection method is to assign a globally unique ID (or a monotonically increasing version) to each message at the producer side, then verify continuity or existence on the consumer side using an interceptor.

For idempotent consumption and to avoid duplicate processing, a common approach is to maintain a message log table (or a Redis set) with fields message_id and status . Before processing, the consumer checks the log; if the ID already exists, the message is skipped.

When multiple producers or consumers exist, a globally unique ID (e.g., Snowflake, UUID, Redis atomic counter) is preferred over simple version numbers.

To handle message backlog, the interviewee should mention:

Scaling the number of consumer instances.

Increasing the number of topic partitions (e.g., in Kafka) so that each consumer can process a separate partition.

Temporarily degrading non‑critical features and using monitoring/log analysis to locate bottlenecks.

Summary

Key points to remember for interview questions:

Identify where message loss can occur (production, storage, consumption) and how to monitor it.

Explain reliable delivery mechanisms (broker ACK, replication, consumer ACK after business logic).

Describe idempotent consumption techniques (message log, unique IDs, Redis).

Discuss strategies for message backlog: consumer scaling, partition increase, monitoring, and graceful degradation.

distributed systemsKafkaMessage QueueMQinterview preparationMessage Lossidempotent consumption
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.