Understanding PFAIL and FAIL States in Redis Cluster Node Failure Detection

This article explains the PFAIL (possible fail) and FAIL (failed) states in Redis clusters, describes the state transition process, demonstrates node failure and automatic failover with command‑line examples, and provides practical insights into cluster health monitoring and recovery.

ClusterDatabaseFAIL

0 likes · 5 min read

Understanding PFAIL and FAIL States in Redis Cluster Node Failure Detection

Xiaokun's Architecture Exploration Notes

Apr 20, 2025 · Fundamentals

Why Unreliable Networks Threaten Distributed Systems—and How to Mitigate Them

The article explains how network failures such as packet loss, reordering, latency, and ambiguous node failures make distributed systems unreliable, compares synchronous and asynchronous networks, and discusses the trade‑off between timeout settings and resource utilization.

Distributed SystemsLatencyNetwork Reliability

0 likes · 8 min read

Why Unreliable Networks Threaten Distributed Systems—and How to Mitigate Them

Efficient Ops

Sep 1, 2020 · Cloud Native

How to Keep Your Kubernetes Cluster Running When a Node Goes Down

This article explains the architecture and practical techniques for achieving high availability in Kubernetes clusters, covering control‑plane and worker‑node design, network service handling, connection reuse, node eviction, storage considerations, and application‑level strategies to ensure continuous service during node failures.

ClusterKubernetesNode Failure

0 likes · 23 min read

How to Keep Your Kubernetes Cluster Running When a Node Goes Down

Tencent Cloud Developer

Aug 18, 2020 · Cloud Native

Kubernetes High Availability: Architecture, Network, Storage, and Application Strategies

The article explains how to achieve Kubernetes high availability by designing a three‑node control‑plane with stacked etcd, using pod anti‑affinity, tuning node‑monitor timers, handling stale endpoints, configuring TCP keep‑alive, managing node taints and eviction, and choosing RWX storage or appropriate StatefulSet strategies to minimize service disruption after node failures.

ClusterKubernetesNode Failure

0 likes · 21 min read

Kubernetes High Availability: Architecture, Network, Storage, and Application Strategies