Redis Sentinel Failing Health Checks: Troubleshooting the CLUSTERDOWN Error
Quick Fix Summary
TL;DRCheck if a majority of Sentinel nodes are reachable and can communicate. Restart the Sentinel service on a quorum of nodes.
The CLUSTERDOWN error occurs when Redis Sentinel cannot achieve a quorum to perform failover operations, often due to network partitions, misconfiguration, or insufficient healthy Sentinel instances.
Diagnosis & Causes
Recovery Steps
Step 1: Verify Sentinel Cluster State and Quorum
Check the status of all Sentinel instances to see which are reachable and confirm the current master.
redis-cli -p 26379 sentinel masters
redis-cli -p 26379 sentinel sentinels <master-name>
redis-cli -p 26379 sentinel get-master-addr-by-name <master-name> Step 2: Check Sentinel and Redis Logs for Errors
Examine logs for connection failures, vote disagreements, or configuration errors.
sudo journalctl -u redis-sentinel --since "1 hour ago"
sudo tail -f /var/log/redis/sentinel.log
sudo grep -E "(failover|vote|quorum|down)" /var/log/redis/sentinel.log Step 3: Validate Network Connectivity Between Sentinels
Ensure all Sentinel nodes can communicate on their configured ports (default 26379).
for ip in $(redis-cli -p 26379 sentinel sentinels <master-name> | grep ip | awk -F: '{print $2}'); do nc -zv $ip 26379; done
sudo ss -tlnp | grep 26379 Step 4: Confirm Sentinel Configuration and Quorum Settings
Verify the `sentinel monitor` directive and `quorum` value are consistent across all Sentinel configs.
sudo grep -E "^(sentinel monitor|sentinel down-after-milliseconds|quorum)" /etc/redis/sentinel.conf
cat /etc/redis/sentinel.conf Step 5: Force a Sentinel Failover if Quorum is Achievable
If a quorum of Sentinels is reachable but the cluster is stuck, manually trigger a failover.
redis-cli -p 26379 sentinel failover <master-name> Step 6: Restart Sentinel Services to Clear State
Gracefully restart Sentinel instances, starting with the one that can see the current master.
sudo systemctl restart redis-sentinel
sudo systemctl status redis-sentinel Step 7: Check Underlying Redis Master/Slave Health
Ensure the Redis instances being monitored are themselves healthy and replicating.
redis-cli -h <master-ip> -p 6379 info replication
redis-cli -h <slave-ip> -p 6379 info replication Architect's Pro Tip
"A split-brain scenario where two subsets of Sentinels each elect a different master is a common root cause. Always verify the `master` field from `sentinel masters` on ALL Sentinel nodes to ensure consensus."
Frequently Asked Questions
How many Sentinel nodes do I need to avoid CLUSTERDOWN?
You need a quorum, which is typically a majority. For 3 nodes, quorum is 2. For 5 nodes, quorum is 3. Always deploy an odd number (3, 5) to avoid ties.
Can I temporarily fix this by restarting just one Sentinel?
No. Restarting a single Sentinel often won't resolve a quorum issue. You must restore connectivity or restart enough Sentinels to re-establish a majority quorum.