Databases 6 min read

Redis Data Loss Scenarios and Mitigation Strategies

The article explains how asynchronous replication and split‑brain situations can cause data loss in Redis clusters, and describes configuration parameters and client‑side techniques to minimize such loss while maintaining high availability.

Selected Java Interview Questions

Apr 21, 2021

Redis Data Loss Scenarios and Mitigation Strategies

1. Data Loss Scenarios

Asynchronous replication loss

Cluster split‑brain loss

1. Asynchronous Replication Loss

Redis replicates data from master to slaves asynchronously; when a client writes to the master, it receives an OK before the data is propagated. If the master crashes before the replication completes, the data residing only in the master’s memory is lost.

Even with persistence enabled, a crash triggers Sentinel to elect a new master. When the old master restarts, it must sync from the new master, whose dataset may be empty (if no writes occurred in the meantime), causing the old master’s data to be overwritten and lost.

2. Split‑Brain in a Cluster

A split‑brain occurs when network partitions isolate masters from slaves. Sentinel, seeing no heartbeat, assumes the master has failed and promotes a slave to master. If the original master is still alive, clients may continue writing to it while the new master holds no data, leading to divergent datasets.

When the partition heals, the old master is demoted to a slave and synchronizes from the new master, discarding its divergent writes and causing massive data loss.

2. How to Minimize Data Loss

Two Redis configuration parameters can be tuned to reduce the risk of loss:

min-slaves-to-write 1
min-slaves-max-lag 10

min‑slaves‑to‑write

(default 0) specifies the minimum number of slaves that must be connected for the master to accept writes. min‑slaves‑max‑lag (default 10 seconds) defines the maximum acceptable replication lag. If the number of slaves or their lag exceeds these thresholds, the master stops accepting write commands.

By lowering min‑slaves‑max‑lag, the system can prevent large‑scale data loss during failures because writes are blocked before the lag becomes critical.

Client‑side mitigation strategies include temporarily buffering writes in local cache or disk, or forwarding them to a reliable message queue such as Kafka for later replay to the master.

These settings should be tested and adjusted according to the specific deployment environment to achieve the best trade‑off between availability and data safety.

Source: blog.csdn.net/qq_37142346/article/details/89435458

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

high availability Redis configuration replication Cluster Data loss

Written by

Selected Java Interview Questions

A professional Java tech channel sharing common knowledge to help developers fill gaps. Follow us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.