Tag

resilience

1 views collected around this technical thread.

DaTaobao Tech
DaTaobao Tech
Apr 28, 2025 · Frontend Development

Front‑End Architecture and Performance Optimization for a Large‑Scale Chinese New Year Interactive Activity

The article details a large‑scale Chinese New Year interactive activity’s front‑end architecture, describing a layered system for business logic, data abstraction, and animation engines, unified data handling, dynamic animation rendering with downgrade paths, high‑concurrency QPS reduction, resilience measures, and extensive performance and workflow optimizations.

Performanceanimationarchitecture
0 likes · 15 min read
Front‑End Architecture and Performance Optimization for a Large‑Scale Chinese New Year Interactive Activity
Cognitive Technology Team
Cognitive Technology Team
Apr 11, 2025 · Backend Development

Hystrix Service Isolation: Thread‑Pool and Semaphore Isolation Patterns

The article explains how Hystrix uses thread‑pool and semaphore isolation to prevent cascading failures in microservice architectures, detailing implementation, configuration defaults, suitable scenarios, and recommendations for building resilient distributed systems.

HystrixMicroservicesSemaphore
0 likes · 5 min read
Hystrix Service Isolation: Thread‑Pool and Semaphore Isolation Patterns
FunTester
FunTester
Mar 31, 2025 · Operations

Performance Testing and Fault Testing: Complementary Pillars for System Stability

The article explains how performance testing measures system efficiency under load while fault testing validates resilience under abnormal conditions, highlighting their shared goals, differences, overlapping toolchains, and how their combined use drives architecture optimization and improves service level agreements in modern complex software systems.

Fault InjectionOperationsPerformance Testing
0 likes · 14 min read
Performance Testing and Fault Testing: Complementary Pillars for System Stability
FunTester
FunTester
Mar 25, 2025 · Operations

Integrating Chaos Engineering into Service Dependency Governance for Resilient Cloud‑Native Systems

This article explores how to embed chaos engineering practices into service dependency governance, detailing dynamic validation versus static analysis, fault injection techniques, multi‑point failure simulations, and data‑driven optimizations to build robust, self‑healing microservice architectures in cloud‑native environments.

Chaos EngineeringMicroservicesOperations
0 likes · 18 min read
Integrating Chaos Engineering into Service Dependency Governance for Resilient Cloud‑Native Systems
FunTester
FunTester
Mar 7, 2025 · Operations

Fault Testing: Proactive Resilience Engineering for Distributed Systems

Fault testing, akin to a shield, deliberately injects failures into distributed and cloud‑native systems to expose weak points, verify recovery mechanisms, and improve overall reliability, ensuring business continuity even under unexpected disruptions.

Chaos EngineeringDistributed SystemsOperations
0 likes · 11 min read
Fault Testing: Proactive Resilience Engineering for Distributed Systems
Architect
Architect
Jan 25, 2025 · Backend Development

HTTP Retry Strategies in Offline Store Systems: Simple Loop, Apache HttpClient, and MQ‑Based Asynchronous Retries

This article explores practical HTTP retry solutions for offline store applications, covering a basic loop retry, the built‑in retry mechanism of Apache HttpClient with custom handlers, and an asynchronous retry approach using message queues to achieve higher reliability and eventual consistency.

Apache HttpClientHTTPJava
0 likes · 12 min read
HTTP Retry Strategies in Offline Store Systems: Simple Loop, Apache HttpClient, and MQ‑Based Asynchronous Retries
Cognitive Technology Team
Cognitive Technology Team
Nov 14, 2024 · Operations

Designing Self‑Healing Applications for Fault Tolerance in Distributed Systems

To ensure distributed applications can recover automatically from hardware, network, or service failures, this guide outlines three core capabilities—fault detection, graceful handling, and monitoring—plus practical strategies such as asynchronous component separation, retries, circuit breakers, isolation, load shedding, failover, compensation, checkpointing, graceful degradation, rate limiting, leader election, fault injection, chaos engineering, and use of availability zones.

Distributed SystemsOperationsSelf-healing
0 likes · 7 min read
Designing Self‑Healing Applications for Fault Tolerance in Distributed Systems
Cognitive Technology Team
Cognitive Technology Team
Nov 1, 2024 · Fundamentals

Design Principles for Solution Architecture: Scalability, Resilience, Performance, and Automation

This article outlines essential solution‑architecture design principles—including workload scalability, resilient construction, performance optimization, replaceable resources, loose coupling, service‑oriented design, appropriate storage selection, data‑driven approaches, constraint mitigation, pervasive security, and comprehensive automation—to help architects build robust, scalable, and maintainable systems.

AutomationPerformancecloud computing
0 likes · 20 min read
Design Principles for Solution Architecture: Scalability, Resilience, Performance, and Automation
政采云技术
政采云技术
Nov 29, 2023 · Frontend Development

API Failure Resilience Using CDN and IndexedDB Caching

The article presents a comprehensive strategy for handling API outages by storing data locally with IndexedDB, synchronizing updates through a CDN, and implementing Axios interceptors and Node‑based scheduled jobs to ensure seamless user experience without white‑screen failures.

APIAxiosCDN
0 likes · 12 min read
API Failure Resilience Using CDN and IndexedDB Caching
Spring Full-Stack Practical Cases
Spring Full-Stack Practical Cases
Nov 9, 2023 · Backend Development

Preventing Service Avalanche with Hystrix: Strategies and Code Samples

This article explains how synchronous service calls can cause thread exhaustion and cascading failures known as the avalanche effect, and demonstrates how to use Hystrix's circuit‑breaker, isolation, and fallback features with practical Java code to protect backend systems.

Backend DevelopmentHystrixJava
0 likes · 10 min read
Preventing Service Avalanche with Hystrix: Strategies and Code Samples
Architects Research Society
Architects Research Society
Oct 3, 2023 · Cloud Native

Chaos Engineering: Concepts, History, Benefits, Challenges, and Getting Started

Chaos engineering is a disciplined approach to testing distributed systems by intentionally injecting failures to verify resilience, covering its definition, origins at Netflix, operational workflow, benefits, challenges, and practical steps for organizations to adopt resilient cloud‑native applications.

Chaos EngineeringDevOpsObservability
0 likes · 18 min read
Chaos Engineering: Concepts, History, Benefits, Challenges, and Getting Started
Architects Research Society
Architects Research Society
Aug 12, 2023 · Operations

Ambassador Pattern: Using an External Proxy Service for Client Network Calls

The ambassador pattern introduces an external sidecar proxy that offloads common client‑side networking concerns such as routing, resilience, logging, and security, enabling legacy or hard‑to‑modify applications to gain cloud‑native capabilities without changing their code.

MicroservicesSidecarambassador pattern
0 likes · 8 min read
Ambassador Pattern: Using an External Proxy Service for Client Network Calls
Architects Research Society
Architects Research Society
Aug 10, 2023 · Cloud Native

Resilience Strategies for Cloud‑Native Distributed Systems

This article explains how cloud‑native distributed systems achieve higher availability through resilience strategies such as load balancing, timeouts with automatic retries, deadlines, and circuit breakers, describing their placement across OSI layers, implementation options via libraries or proxies, and practical algorithm choices.

Load BalancingMicroservicescircuit breaker
0 likes · 25 min read
Resilience Strategies for Cloud‑Native Distributed Systems
ByteDance SYS Tech
ByteDance SYS Tech
Feb 28, 2023 · Cloud Native

How ByteDance’s ARES Boosts Cloud‑Native Resilience with Chaos Engineering

This article explains ByteDance’s end‑to‑end chaos engineering practice for cloud‑native environments, covering its background, principles, comparison with traditional testing, the evolution of its internal platforms, and a detailed look at the Application Resilience Enhancement Service (ARES) and its core features.

Chaos EngineeringFault InjectionKubernetes
0 likes · 17 min read
How ByteDance’s ARES Boosts Cloud‑Native Resilience with Chaos Engineering
Architects Research Society
Architects Research Society
Oct 10, 2022 · R&D Management

Future‑Ready CIO Leadership: Insights from Three Executives

The article explores how business‑driven CIOs are updating their leadership playbooks for the future of work, emphasizing adaptability, resilience, proactive problem‑solving, and a people‑first culture, based on interviews with CIOs from GEHA Health, Panera Bread, and Novant Health.

CIODigitalTransformationPeopleFirst
0 likes · 10 min read
Future‑Ready CIO Leadership: Insights from Three Executives
Architects Research Society
Architects Research Society
Sep 7, 2022 · Operations

An Introduction to Chaos Engineering: Principles, Practices, and Tools

Chaos engineering deliberately injects failures into distributed systems to measure resilience, using scientific experimentation to uncover hidden weaknesses, guide robust design, and improve reliability across development, testing, and production environments.

Chaos EngineeringDistributed SystemsFault Injection
0 likes · 18 min read
An Introduction to Chaos Engineering: Principles, Practices, and Tools
Architects Research Society
Architects Research Society
Jul 7, 2022 · Cloud Native

Resilience Strategies for Cloud‑Native Distributed Systems

This article explains how cloud‑native microservice architectures achieve high availability by applying resilience techniques such as load balancing, timeouts with automatic retries, deadlines, and circuit breakers, and discusses implementation options using libraries or side‑car proxies.

Distributed SystemsLoad BalancingMicroservices
0 likes · 16 min read
Resilience Strategies for Cloud‑Native Distributed Systems
Architects Research Society
Architects Research Society
Jul 2, 2022 · Operations

Reliability vs Resilience: Understanding the Difference and Its Importance

Reliability and resilience are distinct yet complementary goals for cloud services; reliability is the outcome of consistently meeting performance expectations, while resilience describes a system’s ability to continue operating despite failures, and this article introduces the concepts and outlines a four‑part series exploring related threats and enhancement techniques.

Cloud ServicesOperationsReliability
0 likes · 6 min read
Reliability vs Resilience: Understanding the Difference and Its Importance
IT Architects Alliance
IT Architects Alliance
Jun 20, 2022 · Cloud Native

Building Resilient Microservices: Fault Tolerance, Graceful Degradation, and Reliability Patterns

This article explains how microservice architectures can achieve high availability by using fault‑tolerant designs such as graceful degradation, health checks, failover caching, circuit breakers, bulkheads, rate limiting, and systematic change‑management practices to mitigate network, hardware, and application errors.

Operationscircuit breakercloud-native
0 likes · 13 min read
Building Resilient Microservices: Fault Tolerance, Graceful Degradation, and Reliability Patterns