Tag

Node Failure

1 views collected around this technical thread.

Full-Stack Internet Architecture
Full-Stack Internet Architecture
May 15, 2025 · Databases

Understanding PFAIL and FAIL States in Redis Cluster Node Failure Detection

This article explains the PFAIL (possible fail) and FAIL (failed) states in Redis clusters, describes the state transition process, demonstrates node failure and automatic failover with command‑line examples, and provides practical insights into cluster health monitoring and recovery.

ClusterFAILNode Failure
0 likes · 5 min read
Understanding PFAIL and FAIL States in Redis Cluster Node Failure Detection
Xiaokun's Architecture Exploration Notes
Xiaokun's Architecture Exploration Notes
Apr 20, 2025 · Fundamentals

Why Unreliable Networks Threaten Distributed Systems—and How to Mitigate Them

The article explains how network failures such as packet loss, reordering, latency, and ambiguous node failures make distributed systems unreliable, compares synchronous and asynchronous networks, and discusses the trade‑off between timeout settings and resource utilization.

Distributed SystemsNode Failureasynchronous network
0 likes · 8 min read
Why Unreliable Networks Threaten Distributed Systems—and How to Mitigate Them
Efficient Ops
Efficient Ops
Sep 1, 2020 · Cloud Native

How to Keep Your Kubernetes Cluster Running When a Node Goes Down

This article explains the architecture and practical techniques for achieving high availability in Kubernetes clusters, covering control‑plane and worker‑node design, network service handling, connection reuse, node eviction, storage considerations, and application‑level strategies to ensure continuous service during node failures.

ClusterHigh AvailabilityKubernetes
0 likes · 23 min read
How to Keep Your Kubernetes Cluster Running When a Node Goes Down
Tencent Cloud Developer
Tencent Cloud Developer
Aug 18, 2020 · Cloud Native

Kubernetes High Availability: Architecture, Network, Storage, and Application Strategies

The article explains how to achieve Kubernetes high availability by designing a three‑node control‑plane with stacked etcd, using pod anti‑affinity, tuning node‑monitor timers, handling stale endpoints, configuring TCP keep‑alive, managing node taints and eviction, and choosing RWX storage or appropriate StatefulSet strategies to minimize service disruption after node failures.

ClusterHigh AvailabilityKubernetes
0 likes · 21 min read
Kubernetes High Availability: Architecture, Network, Storage, and Application Strategies