Redis Data Loss Scenarios and Mitigation Strategies
The article explains how asynchronous replication and split‑brain situations can cause data loss in Redis clusters, and describes configuration parameters and client‑side techniques to minimize such loss while maintaining high availability.
1. Data Loss Scenarios
Asynchronous replication loss
Cluster split‑brain loss
1. Asynchronous Replication Loss
Redis replicates data from master to slaves asynchronously; when a client writes to the master, it receives an OK before the data is propagated. If the master crashes before the replication completes, the data residing only in the master’s memory is lost.
Even with persistence enabled, a crash triggers Sentinel to elect a new master. When the old master restarts, it must sync from the new master, whose dataset may be empty (if no writes occurred in the meantime), causing the old master’s data to be overwritten and lost.
2. Split‑Brain in a Cluster
A split‑brain occurs when network partitions isolate masters from slaves. Sentinel, seeing no heartbeat, assumes the master has failed and promotes a slave to master. If the original master is still alive, clients may continue writing to it while the new master holds no data, leading to divergent datasets.
When the partition heals, the old master is demoted to a slave and synchronizes from the new master, discarding its divergent writes and causing massive data loss.
2. How to Minimize Data Loss
Two Redis configuration parameters can be tuned to reduce the risk of loss:
min-slaves-to-write 1
min-slaves-max-lag 10min‑slaves‑to‑write (default 0) specifies the minimum number of slaves that must be connected for the master to accept writes. min‑slaves‑max‑lag (default 10 seconds) defines the maximum acceptable replication lag. If the number of slaves or their lag exceeds these thresholds, the master stops accepting write commands.
By lowering min‑slaves‑max‑lag , the system can prevent large‑scale data loss during failures because writes are blocked before the lag becomes critical.
Client‑side mitigation strategies include temporarily buffering writes in local cache or disk, or forwarding them to a reliable message queue such as Kafka for later replay to the master.
These settings should be tested and adjusted according to the specific deployment environment to achieve the best trade‑off between availability and data safety.
Source: blog.csdn.net/qq_37142346/article/details/89435458
Selected Java Interview Questions
A professional Java tech channel sharing common knowledge to help developers fill gaps. Follow us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.