Databases 6 min read

Redis Data Loss Scenarios and Mitigation Strategies

The article explains how asynchronous replication and split‑brain situations can cause data loss in Redis clusters, and describes configuration parameters and client‑side techniques to minimize such loss while maintaining high availability.

Selected Java Interview Questions
Selected Java Interview Questions
Selected Java Interview Questions
Redis Data Loss Scenarios and Mitigation Strategies

1. Data Loss Scenarios

Asynchronous replication loss

Cluster split‑brain loss

1. Asynchronous Replication Loss

Redis replicates data from master to slaves asynchronously; when a client writes to the master, it receives an OK before the data is propagated. If the master crashes before the replication completes, the data residing only in the master’s memory is lost.

Even with persistence enabled, a crash triggers Sentinel to elect a new master. When the old master restarts, it must sync from the new master, whose dataset may be empty (if no writes occurred in the meantime), causing the old master’s data to be overwritten and lost.

2. Split‑Brain in a Cluster

A split‑brain occurs when network partitions isolate masters from slaves. Sentinel, seeing no heartbeat, assumes the master has failed and promotes a slave to master. If the original master is still alive, clients may continue writing to it while the new master holds no data, leading to divergent datasets.

When the partition heals, the old master is demoted to a slave and synchronizes from the new master, discarding its divergent writes and causing massive data loss.

2. How to Minimize Data Loss

Two Redis configuration parameters can be tuned to reduce the risk of loss:

min-slaves-to-write 1
min-slaves-max-lag 10

min‑slaves‑to‑write (default 0) specifies the minimum number of slaves that must be connected for the master to accept writes. min‑slaves‑max‑lag (default 10 seconds) defines the maximum acceptable replication lag. If the number of slaves or their lag exceeds these thresholds, the master stops accepting write commands.

By lowering min‑slaves‑max‑lag , the system can prevent large‑scale data loss during failures because writes are blocked before the lag becomes critical.

Client‑side mitigation strategies include temporarily buffering writes in local cache or disk, or forwarding them to a reliable message queue such as Kafka for later replay to the master.

These settings should be tested and adjusted according to the specific deployment environment to achieve the best trade‑off between availability and data safety.

Source: blog.csdn.net/qq_37142346/article/details/89435458

High AvailabilityRedisconfigurationReplicationClusterData Loss
Selected Java Interview Questions
Written by

Selected Java Interview Questions

A professional Java tech channel sharing common knowledge to help developers fill gaps. Follow us!

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.