Tagged articles
11 articles
Page 1 of 1
OPPO Kernel Craftsman
OPPO Kernel Craftsman
Aug 30, 2024 · Cloud Native

Middleware Containerization and Cloud‑Native Transformation at OPPO

OPPO transformed its sprawling, manually‑provisioned middleware clusters into a cloud‑native, containerized platform by building custom Kubernetes controllers, IP‑preserving StatefulSets, resource‑isolated containers, automated monitoring and self‑healing workflows, enabling rapid provisioning, efficient utilization, fault‑tolerant scaling and future serverless and service‑mesh integration.

ContainerizationKubernetesOperator
0 likes · 20 min read
Middleware Containerization and Cloud‑Native Transformation at OPPO
dbaplus Community
dbaplus Community
Jan 8, 2024 · Backend Development

How We Built an Automated Payment Channel Management System with Redis and Prometheus

To handle growing payment traffic and unreliable third‑party gateways, the team at Zhuanzhuan designed an automated payment‑channel management platform that uses a custom Redis‑based time‑series store, Prometheus monitoring, and a sliding‑window failure‑rate algorithm to detect, alert, and eventually auto‑switch faulty channels.

AutomationPrometheusfault-tolerance
0 likes · 10 min read
How We Built an Automated Payment Channel Management System with Redis and Prometheus
dbaplus Community
dbaplus Community
Jul 8, 2023 · Operations

How QQ Music Achieves High Availability: Architecture, Tools, and Observability

This article explains how QQ Music embraces inevitable faults by building a high‑availability architecture that combines redundant infrastructure, automated failover, stability strategies, a robust toolchain for chaos engineering and full‑link load testing, and comprehensive observability to ensure graceful fault handling at scale.

Observabilitychaos-engineeringdistributed-systems
0 likes · 27 min read
How QQ Music Achieves High Availability: Architecture, Tools, and Observability
Top Architect
Top Architect
Oct 15, 2022 · Backend Development

Designing Fault‑Tolerant Microservices: Patterns and Practices

The article explains how microservice architectures can achieve high availability by isolating failures, employing graceful degradation, change‑management strategies, health checks, fallback caching, retry logic, rate limiting, circuit breakers, and chaos testing, while acknowledging the added complexity and cost of such reliability engineering.

OperationsReliabilitybackend
0 likes · 13 min read
Designing Fault‑Tolerant Microservices: Patterns and Practices
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Jan 26, 2022 · Operations

Mastering Microservice Monitoring, Fault Tolerance, and Security: A Complete Guide

This article explains how to monitor microservice architectures, describes log, tracing, and metric monitoring, compares open‑source tracing tools, outlines fault‑tolerance strategies such as timeout, rate‑limiting, degradation, async buffering and circuit breaking, and details access‑security mechanisms including gateway authentication, service‑side auth, and OAuth2.0 token flows, while also introducing container technology and its role in microservice deployment.

ContainersObservabilityfault-tolerance
0 likes · 43 min read
Mastering Microservice Monitoring, Fault Tolerance, and Security: A Complete Guide
Architecture Digest
Architecture Digest
May 30, 2020 · Fundamentals

A Comprehensive Guide to Learning Distributed Systems

This article provides a thorough overview of distributed systems, explaining their definition, when to adopt them, core concepts like partition and replication, common challenges, essential properties, typical architectural components, and practical implementations to help readers build a solid learning roadmap.

ConsistencyPartitionScalability
0 likes · 15 min read
A Comprehensive Guide to Learning Distributed Systems
Architecture Digest
Architecture Digest
Sep 25, 2017 · Backend Development

Dubbo Cluster Fault Tolerance: A Source Code Walkthrough

This article provides a step‑by‑step analysis of Dubbo’s cluster fault‑tolerance mechanism, explaining the roles of Directory, Router, and LoadBalance, illustrating the execution flow with diagrams, and clarifying how invokers are selected and balanced in a distributed Java RPC framework.

ClusterDubbobackend
0 likes · 8 min read
Dubbo Cluster Fault Tolerance: A Source Code Walkthrough
Qunar Tech Salon
Qunar Tech Salon
Dec 1, 2016 · Backend Development

How to Prevent Service Failures: Suspect Third‑Party, Guard Users, and Perfect Your Own Service

The article shares practical strategies for preventing service failures by doubting third‑party services, protecting against misuse by consumers, and improving one’s own code and architecture, covering fallback plans, timeout settings, retry policies, API design, traffic control, and resource limits.

API-designOperationsReliability
0 likes · 16 min read
How to Prevent Service Failures: Suspect Third‑Party, Guard Users, and Perfect Your Own Service
Architecture Digest
Architecture Digest
Jul 19, 2016 · Operations

Designing a Multi‑Dimensional High‑Availability Architecture for a Game Access System

The article presents a business‑oriented, three‑layer high‑availability architecture for a large‑scale game access platform, detailing measurable goals, client‑side retry with HTTP‑DNS, functional separation and degradation, multi‑region active‑active deployment, and automated, visual monitoring to achieve rapid fault detection, isolation, and recovery.

Operationsdistributed-systemsfault-tolerance
0 likes · 20 min read
Designing a Multi‑Dimensional High‑Availability Architecture for a Game Access System
High Availability Architecture
High Availability Architecture
May 11, 2016 · Cloud Native

Key Microservice Capabilities Illustrated by the Starbucks Process

The article uses the Starbucks coffee‑making workflow as an analogy to explain how clustering, stateless task handling, service‑oriented design, asynchronous interfaces, and fault‑tolerant mechanisms together enable traditional systems to become highly scalable microservices on the cloud.

cloud-nativedistributed-systemsfault-tolerance
0 likes · 17 min read
Key Microservice Capabilities Illustrated by the Starbucks Process