Tag

system resilience

1 views collected around this technical thread.

FunTester
FunTester
May 20, 2025 · Operations

Baseline Metrics for Initiating Chaos Engineering

The article outlines essential baseline metrics—including application, SEV, alert, and infrastructure indicators—required before launching chaos engineering experiments, describes a multi‑stage experiment sequence across known and unknown system areas, and presents best‑practice guidelines for safely conducting chaos tests in production environments.

Chaos Engineeringbaseline metricsdistributed systems
0 likes · 9 min read
Baseline Metrics for Initiating Chaos Engineering
FunTester
FunTester
May 19, 2025 · Operations

Chaos Engineering Tools, Theory, and Practices

Chaos engineering, a scientific method for improving system resilience, is explored through an overview of leading tools such as Gremlin, ChaosBlade, Chaos Mesh, Chaos Toolkit, and ChaosMeta, alongside core concepts, real-world case studies, common misconceptions, and the practical value of controlled fault injection in distributed systems.

Chaos EngineeringFault InjectionReliability
0 likes · 12 min read
Chaos Engineering Tools, Theory, and Practices
JD Tech
JD Tech
Apr 17, 2025 · Operations

Chaos Engineering: Principles, Core Steps, Tool Selection, and AI Integration

This article explains chaos engineering—its definition, core principles, experimental workflow, tool selection, AI‑driven enhancements, and practical case studies—providing a comprehensive guide for building resilient distributed systems across backend, cloud‑native, mobile, and AI‑enabled environments.

AI integrationChaos EngineeringFault Injection
0 likes · 26 min read
Chaos Engineering: Principles, Core Steps, Tool Selection, and AI Integration
FunTester
FunTester
Mar 14, 2025 · Operations

Fault Testing: Enhancing System Resilience through Controlled Failure Simulations

The article explains how fault testing—by deliberately injecting failures in a controlled environment—helps identify system weaknesses, validates post‑mortem improvements, and drives architectural optimization, thereby increasing high‑availability and resilience of modern internet services.

Chaos EngineeringHigh Availabilityfault testing
0 likes · 8 min read
Fault Testing: Enhancing System Resilience through Controlled Failure Simulations
FunTester
FunTester
Mar 12, 2025 · Operations

Fault Injection Testing: Concepts, Scenarios, Process, and Best Practices

Fault injection testing deliberately introduces failures into a system to assess its resilience, helping identify weak points, improve retry and timeout mechanisms, and ensure robust operation across software, protocol, and infrastructure layers, with practical guidance on processes, tools, and Kubernetes-specific practices.

Chaos EngineeringFault InjectionKubernetes
0 likes · 8 min read
Fault Injection Testing: Concepts, Scenarios, Process, and Best Practices
FunTester
FunTester
Sep 20, 2024 · Operations

Chaos Engineering vs Fault Testing: Methods, Challenges, and Future Trends

This article compares chaos engineering and fault testing, outlines fault injection techniques, implementation layers, testing strategies, challenges, and future trends such as automation, AI-driven diagnostics, and cloud‑native integration, providing a comprehensive guide for improving system resilience and reliability.

Chaos Engineeringcloud nativefault testing
0 likes · 17 min read
Chaos Engineering vs Fault Testing: Methods, Challenges, and Future Trends
Full-Stack Internet Architecture
Full-Stack Internet Architecture
Sep 14, 2021 · Operations

Understanding Rate Limiting, Degradation, and Circuit Breaking in Distributed Systems

This article explains the concepts of rate limiting, service degradation, and circuit breaking, illustrating passive and active throttling strategies, asynchronous processing, and practical examples such as Alibaba Sentinel, token‑based controls, and Hystrix, to help engineers design resilient, high‑availability systems.

Rate Limitingcircuit breakingdistributed systems
0 likes · 11 min read
Understanding Rate Limiting, Degradation, and Circuit Breaking in Distributed Systems
iQIYI Technical Product Team
iQIYI Technical Product Team
Sep 11, 2020 · Cloud Native

Chaos Engineering Framework and Practices in iQIYI FinTech Team

The iQIYI FinTech team implemented a Chaos Engineering framework, using a purpose‑driven Chaos Monkey to inject controlled failures, validate high‑availability, isolation, and self‑healing of payment services, derive architectural improvements, build a fault‑case library, and transition from fault detection to proactive system robustness.

Chaos EngineeringChaos MonkeyFinTech
0 likes · 9 min read
Chaos Engineering Framework and Practices in iQIYI FinTech Team
Youzan Coder
Youzan Coder
Jun 22, 2018 · Operations

Chaos Engineering: Definition, Principles, and Implementation Steps

Chaos engineering is a disciplined practice that injects controlled faults into distributed systems—often in production—to validate steady-state hypotheses, uncover hidden reliability weaknesses, and continuously improve resilience, as illustrated by the staged implementations and fault-injection techniques used by companies such as JD.com, Youzan, and Netflix.

Chaos EngineeringFault InjectionReliability
0 likes · 11 min read
Chaos Engineering: Definition, Principles, and Implementation Steps