Tagged articles

11 articles

Page 1 of 1

Aug 30, 2024 · Cloud Native

Middleware Containerization and Cloud‑Native Transformation at OPPO

OPPO transformed its sprawling, manually‑provisioned middleware clusters into a cloud‑native, containerized platform by building custom Kubernetes controllers, IP‑preserving StatefulSets, resource‑isolated containers, automated monitoring and self‑healing workflows, enabling rapid provisioning, efficient utilization, fault‑tolerant scaling and future serverless and service‑mesh integration.

ContainerizationKubernetesOperator

0 likes · 20 min read

Middleware Containerization and Cloud‑Native Transformation at OPPO

JD Retail Technology

May 10, 2024 · Operations

High Availability and the Dispersal Principle: Concepts, Practices, and Benefits

This article explains the concept of high availability, introduces the dispersal principle, demonstrates its application in microservice architectures and distributed storage, and outlines the benefits such as improved reliability, scalability, fault tolerance, and reduced single‑point failures.

Scalabilitydistributed-systemsfault-tolerance

0 likes · 10 min read

High Availability and the Dispersal Principle: Concepts, Practices, and Benefits

dbaplus Community

Jan 8, 2024 · Backend Development

How We Built an Automated Payment Channel Management System with Redis and Prometheus

To handle growing payment traffic and unreliable third‑party gateways, the team at Zhuanzhuan designed an automated payment‑channel management platform that uses a custom Redis‑based time‑series store, Prometheus monitoring, and a sliding‑window failure‑rate algorithm to detect, alert, and eventually auto‑switch faulty channels.

AutomationPrometheusfault-tolerance

0 likes · 10 min read

How We Built an Automated Payment Channel Management System with Redis and Prometheus

dbaplus Community

Jul 8, 2023 · Operations

How QQ Music Achieves High Availability: Architecture, Tools, and Observability

This article explains how QQ Music embraces inevitable faults by building a high‑availability architecture that combines redundant infrastructure, automated failover, stability strategies, a robust toolchain for chaos engineering and full‑link load testing, and comprehensive observability to ensure graceful fault handling at scale.

Observabilitychaos-engineeringdistributed-systems

0 likes · 27 min read

How QQ Music Achieves High Availability: Architecture, Tools, and Observability

Top Architect

Oct 15, 2022 · Backend Development

Designing Fault‑Tolerant Microservices: Patterns and Practices

The article explains how microservice architectures can achieve high availability by isolating failures, employing graceful degradation, change‑management strategies, health checks, fallback caching, retry logic, rate limiting, circuit breakers, and chaos testing, while acknowledging the added complexity and cost of such reliability engineering.

OperationsReliabilitybackend

0 likes · 13 min read

Designing Fault‑Tolerant Microservices: Patterns and Practices

ITFLY8 Architecture Home

Jan 26, 2022 · Operations

Mastering Microservice Monitoring, Fault Tolerance, and Security: A Complete Guide

This article explains how to monitor microservice architectures, describes log, tracing, and metric monitoring, compares open‑source tracing tools, outlines fault‑tolerance strategies such as timeout, rate‑limiting, degradation, async buffering and circuit breaking, and details access‑security mechanisms including gateway authentication, service‑side auth, and OAuth2.0 token flows, while also introducing container technology and its role in microservice deployment.

ContainersObservabilityfault-tolerance

0 likes · 43 min read

Mastering Microservice Monitoring, Fault Tolerance, and Security: A Complete Guide

Architecture Digest

May 30, 2020 · Fundamentals

A Comprehensive Guide to Learning Distributed Systems

This article provides a thorough overview of distributed systems, explaining their definition, when to adopt them, core concepts like partition and replication, common challenges, essential properties, typical architectural components, and practical implementations to help readers build a solid learning roadmap.

ConsistencyPartitionScalability

0 likes · 15 min read

A Comprehensive Guide to Learning Distributed Systems

Architecture Digest

Sep 25, 2017 · Backend Development

Dubbo Cluster Fault Tolerance: A Source Code Walkthrough

This article provides a step‑by‑step analysis of Dubbo’s cluster fault‑tolerance mechanism, explaining the roles of Directory, Router, and LoadBalance, illustrating the execution flow with diagrams, and clarifying how invokers are selected and balanced in a distributed Java RPC framework.

ClusterDubbobackend

0 likes · 8 min read

Dubbo Cluster Fault Tolerance: A Source Code Walkthrough

Qunar Tech Salon

Dec 1, 2016 · Backend Development

How to Prevent Service Failures: Suspect Third‑Party, Guard Users, and Perfect Your Own Service

The article shares practical strategies for preventing service failures by doubting third‑party services, protecting against misuse by consumers, and improving one’s own code and architecture, covering fallback plans, timeout settings, retry policies, API design, traffic control, and resource limits.

API-designOperationsReliability

0 likes · 16 min read

How to Prevent Service Failures: Suspect Third‑Party, Guard Users, and Perfect Your Own Service

Architecture Digest

Jul 19, 2016 · Operations

Designing a Multi‑Dimensional High‑Availability Architecture for a Game Access System

The article presents a business‑oriented, three‑layer high‑availability architecture for a large‑scale game access platform, detailing measurable goals, client‑side retry with HTTP‑DNS, functional separation and degradation, multi‑region active‑active deployment, and automated, visual monitoring to achieve rapid fault detection, isolation, and recovery.

Operationsdistributed-systemsfault-tolerance

0 likes · 20 min read

High Availability Architecture

May 11, 2016 · Cloud Native

Key Microservice Capabilities Illustrated by the Starbucks Process

The article uses the Starbucks coffee‑making workflow as an analogy to explain how clustering, stateless task handling, service‑oriented design, asynchronous interfaces, and fault‑tolerant mechanisms together enable traditional systems to become highly scalable microservices on the cloud.

cloud-nativedistributed-systemsfault-tolerance

0 likes · 17 min read

Key Microservice Capabilities Illustrated by the Starbucks Process