Operations 20 min read

Mastering Redis Sentinel: Build High‑Availability Clusters Step‑by‑Step

This article explains Redis Sentinel’s role in achieving high availability, details its core functions, underlying Raft‑based algorithm, configuration parameters, practical setup steps, fault‑tolerance mechanisms, quorum and majority calculations, and demonstrates failover and recovery scenarios with real command‑line examples.

Efficient Ops
Efficient Ops
Efficient Ops
Mastering Redis Sentinel: Build High‑Availability Clusters Step‑by‑Step

Preface

Continuing from the previous article, because Redis master‑slave replication cannot achieve high availability, Redis uses Sentinel on top of the master‑slave architecture to implement a highly available Redis cluster.

Redis Sentinel

Sentinel is a crucial component in Redis cluster architecture; it solves the need for manual intervention when master‑slave replication fails.

Key functions of Redis Sentinel:

Cluster monitoring: monitors whether Redis master and slaves are working correctly.

Message notification: sends alerts to administrators when a Redis instance fails.

Failover: when the master node fails, automatically elects a new master, enabling self‑healing.

Configuration center: after a failure, notifies clients and other slaves of the new master address.

Principle

Redis Sentinel’s core algorithm is based on Raft, used for distributed system fault tolerance and leader election. The process includes:

Each Sentinel automatically discovers other Sentinels and slaves, sending a PING to known masters, slaves, and Sentinels once per second.

If an instance does not reply to PING within the

down-after-milliseconds

threshold, it is marked subjectively down (SDOWN).

When a master is marked SDOWN, all Sentinels monitoring it confirm the status at the same frequency.

If a majority (quorum) of Sentinels agree, the master is marked objectively down (ODOWN).

Sentinels then increase INFO command frequency to the downed master’s slaves from every 10 seconds to every second.

When enough Sentinels agree, the ODOWN status is cleared once the master responds again.

Detailed steps can be observed in the Sentinel logs.

Sentinel Deployment Practice

Assuming master‑slave setup is already done, configure Sentinel via

sentinel.conf

:

# Sentinel instance port, default 26379
port 26379
dir ./
protected-mode no
daemonize yes
logfile ./sentinel.log
# Monitor master
sentinel monitor mymaster 127.0.0.1 6379 2
# Authentication (if master requires a password)
sentinel auth-pass mymaster 123456
# Down‑after timeout (default 30 s)
sentinel down-after-milliseconds mymaster 30000
# Number of slaves that can sync simultaneously during failover
sentinel parallel-syncs mymaster 1
# Failover timeout (default 180 000 ms)
sentinel failover-timeout mymaster 180000
# Notification scripts
sentinel notification-script mymaster /var/redis/notify.sh
sentinel client-reconfig-script mymaster /var/redis/reconfig.sh

Start order: first start the Redis master, then the slaves, and finally the Sentinel instances.

Verify replication on the master:

127.0.0.1:6379> info replication
# Replication
role:master
connected_slaves:2
slave0:ip=172.22.29.101,port=6379,state=online,offset=4448,lag=1
slave1:ip=172.22.29.100,port=6379,state=online,offset=4448,lag=1

Check Sentinel status:

127.0.0.1:26379> info sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
master0:name=mymaster,status=ok,address=172.22.29.99:6379,slaves=2,sentinels=3

Simulating Failover

Stop the master Redis service:

systemctl stop redis

Sentinel logs show the failover process, including SDOWN, ODOWN, leader election, slave promotion, and master switch.

Simulating Node Recovery

Restart the previously stopped Redis node:

systemctl start redis

Sentinel logs indicate the node is converted back to a slave and resynchronizes with the new master.

Sentinel Node Calculations

Two important parameters:

quorum : minimum number of Sentinels that must agree a master is ODOWN.

majority : minimum number of Sentinels required to authorize a failover.

If

quorum < majority

, a failover can be authorized with fewer Sentinels than required for quorum; if

quorum ≥ majority

, all Sentinels in the quorum must agree.

int sentinelIsQuorumReachable(sentinelRedisInstance *master, int *usableptr) {
    int usable = 1; // count self
    int voters = dictSize(master->sentinels) + 1; // known Sentinels + self
    // iterate over known Sentinels and count usable ones
    // ...
    if (usable < (int)master->quorum) result |= SENTINEL_ISQR_NOQUORUM;
    if (usable < voters/2+1) result |= SENTINEL_ISQR_NOAUTH;
    if (usableptr) *usableptr = usable;
    return result;
}
majority = voters/2 + 1;

Why at Least Three Sentinels?

With only two Sentinels, the majority is 2; if one fails, the remaining Sentinel cannot meet the majority requirement, preventing failover when the master crashes.

Split‑Brain Scenario

A split‑brain occurs when network partitions isolate the master from slaves and Sentinels, causing slaves to be promoted to masters independently. This can lead to data loss if the original master continues to accept writes.

Two configuration parameters help mitigate split‑brain:

min-replicas-to-write 3

– requires at least three slaves to be connected before the master accepts writes.

min-replicas-max-lag 10

– limits the maximum replication lag to 10 seconds.

With these settings, a master will reject write requests during a split‑brain, reducing potential data loss.

Summary

Redis Sentinel provides high‑availability for Redis by monitoring, notifying, and automatically failing over master nodes, while Redis Cluster addresses scalability and throughput.

high availabilityRedisconfigurationSentinelFailover
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.