Databases 11 min read

Analysis of Redis Sentinel Failover Issue in Redis 7.4.0 and Resolution via Pub/Sub ACL Adjustment

This article investigates a Redis Sentinel failover anomaly in version 7.4.0 where the sentinel repeatedly elects a failed master, explains the underlying s_down/o_down states, examines network, configuration, and ACL settings, and resolves the issue by adjusting Pub/Sub permissions to allow proper failover.

Aikesheng Open Source Community
Aikesheng Open Source Community
Aikesheng Open Source Community
Analysis of Redis Sentinel Failover Issue in Redis 7.4.0 and Resolution via Pub/Sub ACL Adjustment

Background

The test environment is simplified to a one‑master, one‑slave, one‑sentinel topology, with the sentinel monitoring the master named ms . The nodes are:

master: 172.20.134.2

slave: 172.20.134.3

sentinel: 172.20.134.4

Problem Description

When running Redis 7.4.0 with sentinel, after the master instance fails, the sentinel makes abnormal decisions and continues to elect the failed master as the new leader.

21903:X 06 Dec 2024 15:53:04.164 * +slave slave 172.20.134.3:6379 172.20.134.3 6379 @ ms 172.20.134.2 6379
21903:X 06 Dec 2024 15:53:04.168 * Sentinel new configuration saved on disk
# +sdown master ms 172.20.134.2 6379
# +odown master ms 172.20.134.2 6379 #quorum 1/1
# +new-epoch 1
# +try-failover master ms 172.20.134.2 6379
... (subsequent failover attempts) ...

The logs show that after the master process is kill ed, the sentinel repeatedly executes the same failover steps.

Investigation Process

Network Check

Ping and telnet tests between the sentinel node and Redis instances confirmed normal network connectivity.

Sentinel and Redis Configuration Review

The sentinel.conf and redis.conf files were examined. Connection tests using the ACL user markus succeeded.

Sentinel Master State

Running SENTINEL masters on the sentinel revealed that the master flags were s_down , o_down , and disconnected .

Sentinel Slave State

Running SENTINEL SLAVES ms showed that the slave also reported the disconnected flag.

State Definitions

s_down (Subjectively Down): a single sentinel’s view that the instance is unreachable.

o_down (Objectively Down): consensus among a quorum of sentinels, triggering failover.

disconnected : the sentinel cannot maintain a TCP connection to the instance.

Sentinel Configuration Items

down-after-milliseconds : time after which a missing reply marks the instance as s_down .

quorum : number of sentinels required to promote s_down to o_down .

Root Cause Explanation

The slave’s flags=disconnected prevented proper failover. In Redis 6.2+ the ACL system introduced Pub/Sub permissions. By default in Redis 7.4.0 the user’s Pub/Sub permission is set to resetchannels , which blocks the sentinel from subscribing to the internal channels used for state detection.

# Official documentation: https://redis.io/docs/latest/operate/rs/7.4/security/access-control/redis-acl-overview/#pubsub-channels
Pub/sub channels
The & prefix allows access to pub/sub channels (only supported for databases with Redis version 6.2 or later).
To limit access to specific channels, include resetchannels before the allowed channels:

Adjusting the Redis configuration to grant full Pub/Sub access (setting [&*] ) allowed the sentinel to receive the necessary messages, and the failover behavior returned to normal.

Resolution

Modify redis.conf to set acl-pubsub-default [&*] (or equivalent ACL rule) so that sentinel can subscribe to all channels. After this change, the sentinel correctly detects the master failure and promotes the slave.

Key Takeaways

Check sentinel and Redis ACL Pub/Sub permissions when failover does not occur.

Understand the progression disconnected → s_down → o_down in sentinel state handling.

Network connectivity alone may not reveal permission‑related issues.

databaseredisSentinelFailoverPub/SubACL
Aikesheng Open Source Community
Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.