Tagged articles

257 articles

Page 1 of 3

Dec 29, 2025 · Databases

Mastering PostgreSQL Backup & Replication: A Complete Enterprise Guide

An in‑depth enterprise guide explains why backup and replication are critical for PostgreSQL, compares physical, logical, and logical replication methods, provides step‑by‑step command examples, outlines high‑availability architectures, automation scripts, disaster‑recovery procedures, monitoring queries, and common pitfalls to ensure robust data protection.

PostgreSQLdisaster recoveryhigh availability

0 likes · 8 min read

Mastering PostgreSQL Backup & Replication: A Complete Enterprise Guide

Mike Chen's Internet Architecture

Dec 24, 2025 · Operations

How to Deploy a Two‑Location Three‑Center Disaster‑Recovery Architecture for High Availability

This guide explains the two‑location three‑center disaster‑recovery pattern, describing its purpose, typical deployment across two cities and three data centers, and step‑by‑step recommendations for same‑city dual‑active or primary‑backup setups, remote backup strategies, traffic routing, and essential monitoring.

GSLBOperationsSLB

0 likes · 5 min read

How to Deploy a Two‑Location Three‑Center Disaster‑Recovery Architecture for High Availability

Raymond Ops

Dec 22, 2025 · Operations

Mastering Production Site Backup: A Multi‑Layer Disaster Recovery Blueprint

After a midnight disk failure that threatened 300,000 users, this article presents a production‑grade, multi‑layer backup architecture with 3‑2‑1 redundancy, RTO ≤30 min and RPO ≤5 min, covering application code, configuration, database (physical and logical), file storage, automated scheduling, monitoring, performance tuning, a real‑world recovery case, and future AI‑driven enhancements.

Operationsautomationbackup

0 likes · 15 min read

Mastering Production Site Backup: A Multi‑Layer Disaster Recovery Blueprint

Ops Community

Nov 23, 2025 · Databases

How to Recover Accidentally Dropped MySQL Data in 48 Hours – A Complete Step‑by‑Step Guide

This guide walks you through a full disaster‑recovery workflow for MySQL, covering emergency read‑only switching, pinpointing the deletion time via binlog, preparing a recovery instance, restoring the latest full backup with Xtrabackup or mysqldump, applying incremental binlog changes, verifying data integrity, and safely switching traffic back to the restored database.

Data Recoverybackupbinlog

0 likes · 42 min read

How to Recover Accidentally Dropped MySQL Data in 48 Hours – A Complete Step‑by‑Step Guide

StarRocks

Oct 21, 2025 · Databases

How StarRocks 3.5 Enables Fast Cluster Snapshots and Disaster Recovery in Kubernetes

StarRocks 3.5 introduces a cluster‑level snapshot mechanism that automates backup to object storage, supports minute‑level recovery, and integrates with Kubernetes via Helm charts to streamline disaster‑recovery workflows for high‑availability workloads.

KubernetesS3StarRocks

0 likes · 17 min read

How StarRocks 3.5 Enables Fast Cluster Snapshots and Disaster Recovery in Kubernetes

MaGe Linux Operations

Oct 11, 2025 · Operations

7 Fatal Traps That Can Ruin Your Cross‑Cloud Backup – How to Avoid Disaster Recovery Failures

This article examines the hidden pitfalls that cause cross‑cloud backup and disaster‑recovery plans to fail, explains why 70% of first‑time DR drills flop, and provides real‑world case studies, detailed scripts, and proven best‑practice solutions to ensure reliable RTO, RPO, and data integrity.

RPORTObackup scripts

0 likes · 40 min read

7 Fatal Traps That Can Ruin Your Cross‑Cloud Backup – How to Avoid Disaster Recovery Failures

21CTO

Oct 2, 2025 · Operations

What a South Korean Data‑Center Fire Reveals About Cloud Reliability and Disaster Recovery

A lithium‑ion battery fire at the NIRS data center crippled dozens of Korean e‑government services, exposing the urgent need for better battery safety, robust backup systems, and resilient cloud‑based disaster‑recovery strategies to protect national digital infrastructure.

Cloud ComputingData Centerbattery safety

0 likes · 7 min read

What a South Korean Data‑Center Fire Reveals About Cloud Reliability and Disaster Recovery

MaGe Linux Operations

Oct 1, 2025 · Operations

How a Single rm -rf Command Almost Wiped My Data—and the Backup Plan That Saved It

A disastrous rm -rf command erased 2.3 TB of production MySQL data, but a meticulously designed multi‑layer backup strategy—including logical, physical, real‑time, and cloud backups—enabled a 99.4% data recovery within 72 hours, highlighting essential lessons and best‑practice guidelines for reliable data protection.

Data ProtectionOperationsbackup

0 likes · 36 min read

How a Single rm -rf Command Almost Wiped My Data—and the Backup Plan That Saved It

Ops Community

Aug 25, 2025 · Operations

How DRBD Can Save Your Production Data from Disasters

This article explains why most companies suffer long recovery times after data loss, introduces DRBD's real‑time block replication as a solution, and provides detailed architecture designs, deployment steps, monitoring scripts, performance tuning, cost analysis, common pitfalls, and future trends for reliable disaster recovery.

DRBDLinuxdata replication

0 likes · 9 min read

How DRBD Can Save Your Production Data from Disasters

Tech Freedom Circle

Aug 4, 2025 · Operations

How Do Projects Achieve High Availability Without Multi‑Site Active‑Active? – A Meituan Interview Question

The article analyzes high‑availability concepts, from single‑machine risks to multi‑site active‑active architectures, compares cold and hot backup strategies, discusses network latency challenges, and presents Ele.me’s cell‑based, sharding‑driven multi‑region solution with concrete examples, tables, and code snippets.

Shardingcell-based architecturedata replication

0 likes · 28 min read

How Do Projects Achieve High Availability Without Multi‑Site Active‑Active? – A Meituan Interview Question

MaGe Linux Operations

Jul 24, 2025 · Operations

Mastering Production Backup Architecture: A Proven 3‑2‑1 Disaster Recovery Blueprint

This article presents a production‑validated, multi‑layer website backup architecture—including code, database, and file storage strategies, automation scripts, monitoring dashboards, performance tuning, and AI‑driven optimization—to ensure rapid recovery, cost efficiency, and business continuity.

Monitoringautomationbackup

0 likes · 14 min read

Mastering Production Backup Architecture: A Proven 3‑2‑1 Disaster Recovery Blueprint

Zhuanzhuan Tech

Jul 17, 2025 · Backend Development

How Virtual Phone Numbers Reinvent Trust and Resilience in E‑Commerce Services

This article explores the concept of privacy (virtual) phone numbers, their features and deployment modes, and details a multi‑stage engineering evolution—from a simple door‑to‑door recycling prototype to a service‑oriented architecture and a high‑availability disaster‑recovery system—demonstrating how they protect user data, improve system reliability, and support rapid business growth.

Service Architecturedisaster recoveryprivacy number

0 likes · 10 min read

How Virtual Phone Numbers Reinvent Trust and Resilience in E‑Commerce Services

Qunhe Technology Quality Tech

Jul 10, 2025 · Operations

Ensuring Elasticsearch Stability: Testing, Performance, and Disaster Recovery

This article outlines a comprehensive reliability framework for Elasticsearch, covering pre‑release performance evaluation, data accuracy checks, real‑time sync delay alerts, rapid recovery strategies, performance testing methods, and disaster‑recovery measures such as multi‑cluster backup and index alias switching.

Monitoringdata synchronizationdisaster recovery

0 likes · 12 min read

Ensuring Elasticsearch Stability: Testing, Performance, and Disaster Recovery

Cloud Native Technology Community

May 15, 2025 · Operations

How to Precisely Recover a Single Kubernetes Resource from an etcd Snapshot in 5 Steps

This guide explains how to extract and restore a specific Kubernetes resource from an etcd snapshot using a lightweight, step‑by‑step process that avoids full‑cluster recovery, minimizes downtime, and works with tools like etcdctl, auger, and kubectl.

CLIDevOpsKubernetes

0 likes · 8 min read

How to Precisely Recover a Single Kubernetes Resource from an etcd Snapshot in 5 Steps

Bilibili Tech

Apr 22, 2025 · Operations

Client‑Side DCDN Disaster‑Recovery Drills and Automated Testing at Bilibili

Bilibili performed client-side DCDN disaster-recovery drills using a self-built HTTPDNS to simulate DNS, CDN, and SSL faults; automated scripts across Android, iOS, and Web injected errors, measured rendering latency, validated immediate downgrade to commercial services, refined fallback strategies, and demonstrated near-zero user impact during a real network incident.

BilibiliDCDNHTTPDNS

0 likes · 13 min read

Client‑Side DCDN Disaster‑Recovery Drills and Automated Testing at Bilibili

dbaplus Community

Mar 31, 2025 · Operations

10 Server Mistakes That Can End Your Career – Real Disaster Cases & Prevention

This article compiles ten real-world server‑operation disasters, explains the technical fallout of each forbidden action, and provides concrete command‑line remedies and best‑practice safeguards to help engineers avoid career‑ending mistakes.

LinuxSystem Administrationdisaster recovery

0 likes · 7 min read

10 Server Mistakes That Can End Your Career – Real Disaster Cases & Prevention

Open Source Linux

Jan 13, 2025 · Operations

Key Lessons from 2024 Major Service Outages and How to Prevent Future Downtime

The article reviews major 2024 service outages—from Alibaba Cloud to OpenAI—highlights their root causes, and offers practical operations strategies such as disaster recovery, regular backups, load balancing, monitoring, performance tuning, and capacity planning to reduce future downtime.

MonitoringOperationscapacity planning

0 likes · 5 min read

Key Lessons from 2024 Major Service Outages and How to Prevent Future Downtime

Alibaba Cloud Infrastructure

Jan 10, 2025 · Cloud Native

Service-Level Disaster Recovery with Alibaba Cloud Service Mesh (ASM) across Multi-Cluster and Multi-Region Deployments

This guide explains how to handle service‑level failures in Kubernetes by using Alibaba Cloud Service Mesh (ASM) to automatically detect faults, shift traffic based on geographic priority, and implement various multi‑cluster, multi‑region, and multi‑cloud topologies for high availability.

ASMKubernetesTraffic Shifting

0 likes · 31 min read

Service-Level Disaster Recovery with Alibaba Cloud Service Mesh (ASM) across Multi-Cluster and Multi-Region Deployments

IT Architects Alliance

Jan 7, 2025 · Industry Insights

Why Multi-Active Architecture Matters and How to Build It

The article explains why multi‑active (active‑active) architecture is essential for modern enterprises, outlines its evolution from single‑server setups, details core principles like redundancy and data synchronization, compares common deployment patterns, examines industry use cases, and discusses challenges and mitigation strategies.

Cloud ComputingData ConsistencyDistributed Systems

0 likes · 21 min read

Why Multi-Active Architecture Matters and How to Build It

MaGe Linux Operations

Jan 6, 2025 · Operations

What 2024 Outages Teach Us About Building Resilient Systems

A review of major 2024 service disruptions—from Alibaba Cloud to OpenAI—highlights key lessons such as early disaster‑recovery planning, regular backups, load balancing, real‑time monitoring, performance tuning, and capacity planning to improve system reliability and reduce future downtime.

disaster recoveryoutage analysissystem reliability

0 likes · 5 min read

What 2024 Outages Teach Us About Building Resilient Systems

Efficient Ops

Jan 1, 2025 · Operations

What 2024’s Biggest Outages Teach Us About Building Resilient Systems

Reviewing the major service disruptions—from Alibaba Cloud to OpenAI—this article extracts key SRE lessons such as early disaster‑recovery planning, regular backups, load balancing, real‑time monitoring, performance tuning, and capacity planning, urging enterprises to adopt resilient practices for a more stable future.

OperationsOutage ManagementSRE

0 likes · 6 min read

What 2024’s Biggest Outages Teach Us About Building Resilient Systems

Alibaba Cloud Native

Dec 28, 2024 · Cloud Native

How ACK One Multi‑Cluster Gateway Enables Seamless Cross‑AZ and Multi‑Region Disaster Recovery

This article explains how Alibaba Cloud's ACK One multi‑cluster gateway provides active‑active disaster recovery across same‑city AZs, hybrid‑cloud environments, and distant regions, detailing the architecture, setup steps, advantages over DNS‑based solutions, and practical considerations for enterprise workloads.

ACK OneCloud Nativecross-AZ

0 likes · 13 min read

How ACK One Multi‑Cluster Gateway Enables Seamless Cross‑AZ and Multi‑Region Disaster Recovery

Yang Money Pot Technology Team

Dec 26, 2024 · Frontend Development

Design and Implementation of a Multi‑CDN Disaster Recovery Mechanism for Frontend Resource Loading

This article presents a comprehensive multi‑CDN disaster‑recovery solution for frontend static resources, detailing the background, current issues, goals, SDK‑based architecture, monitoring and retry strategies, data‑reporting mechanisms, evaluation results, and future dynamic scheduling improvements.

CDNFrontendMonitoring

0 likes · 12 min read

Design and Implementation of a Multi‑CDN Disaster Recovery Mechanism for Frontend Resource Loading

Bilibili Tech

Nov 19, 2024 · Operations

Building a Lightweight Disaster‑Recovery Drill System at Bilibili: Architecture, Practices, and Lessons

Bilibili’s infrastructure team created a lightweight, multi‑layered disaster‑recovery drill platform—combining an atomic fault library, scenario catalogs, chaos‑experiment orchestration, real‑time observation, and a product‑level interface—backed by standardized governance and CI‑integrated automation, cutting drill preparation from weeks to days and boosting weekly resilience testing across the organization.

disaster recoveryhigh availabilitysite reliability

0 likes · 39 min read

Building a Lightweight Disaster‑Recovery Drill System at Bilibili: Architecture, Practices, and Lessons

ByteDance Cloud Native

Nov 8, 2024 · Databases

Designing Reliable Cross-Cloud Database Disaster Recovery with Volcano Engine

This article explains how to design and implement cross-cloud database disaster recovery, covering background goals, common challenges, step-by-step migration stages, the role of Volcano Engine’s Database Transmission Service, cold-hot separation, HTAP analysis, and practical business value with real-world examples.

DTSDatabasecross-cloud

0 likes · 12 min read

Designing Reliable Cross-Cloud Database Disaster Recovery with Volcano Engine

Java Architecture Stack

Oct 10, 2024 · Databases

Master MySQL Backup & Recovery: Strategies for Every Business Scenario

This guide walks through five practical MySQL backup and restoration workflows—including scheduled full backups, hourly incremental binlog backups for high‑frequency trading, development‑environment data masking, selective table dumps, and cold‑storage disaster recovery—providing exact commands, configuration tweaks, and step‑by‑step procedures.

Database AdministrationIncremental BackupRecovery

0 likes · 7 min read

Master MySQL Backup & Recovery: Strategies for Every Business Scenario

Tencent Cloud Middleware

Sep 9, 2024 · Cloud Native

How TDMQ Pulsar Enables Cross‑Region Replication for Global Data Archiving and Disaster Recovery

Since September 2024, TDMQ Pulsar professional clusters offer message‑level and metadata‑level cross‑region replication, providing solutions for worldwide data archiving and core‑business disaster recovery, with detailed deployment steps, configuration guidance, and a financial‑industry best‑practice case study.

TDMQ Pulsarcross‑region replicationdisaster recovery

0 likes · 9 min read

How TDMQ Pulsar Enables Cross‑Region Replication for Global Data Archiving and Disaster Recovery

Volcano Engine Developer Services

Sep 2, 2024 · Operations

How ByteDance Scales Disaster Recovery: From Single Data Center to Multi‑Region Active‑Active

This article details ByteDance’s disaster‑recovery evolution—from a single‑room deployment to same‑city multi‑data‑center setups and finally to active‑active multi‑region architectures—explaining the challenges, specific failure scenarios, and the strategic practices used to ensure continuous service during outages.

InfrastructureOperationsdisaster recovery

0 likes · 15 min read

How ByteDance Scales Disaster Recovery: From Single Data Center to Multi‑Region Active‑Active

JD Tech

Aug 22, 2024 · Backend Development

Designing a Disaster‑Recovery Data Backup System for JD’s LBS C‑End SOA Service

This article explores the design and implementation of a disaster‑recovery data‑backup architecture for JD’s LBS C‑end SOA service, covering backup strategies, cost‑reduction techniques, grid‑based indexing with H3, client‑side caching, diff verification, and deployment considerations to balance reliability, performance, and expense.

LBSSOAcost optimization

0 likes · 18 min read

Designing a Disaster‑Recovery Data Backup System for JD’s LBS C‑End SOA Service

IT Services Circle

Aug 21, 2024 · Operations

Analysis of NetEase Cloud Music Outage on August 19: Infrastructure Failure and Operational Lessons

On August 19, NetEase Cloud Music suffered a severe infrastructure‑related outage that prevented user login, playlist loading, and song search, prompting a two‑hour recovery effort, a brief free‑membership compensation, and highlighting the critical role of proper change management, gray releases, disaster recovery, and cross‑functional coordination in large‑scale services.

Gray ReleaseInfrastructureNetEase Cloud Music

0 likes · 6 min read

Analysis of NetEase Cloud Music Outage on August 19: Infrastructure Failure and Operational Lessons

JD Retail Technology

Aug 21, 2024 · Operations

Designing a Disaster Recovery and Data Backup System for JD 秒送 LBS C‑End SOA Services

This article explores the design of a disaster‑recovery framework for JD’s秒送 LBS C‑end SOA services, detailing data‑backup strategies, cost‑reduction techniques, grid‑based caching using H3, diff validation, client‑side caching, and deployment modules to balance reliability, performance, and expense.

LBSbackend servicescost optimization

0 likes · 18 min read

Designing a Disaster Recovery and Data Backup System for JD 秒送 LBS C‑End SOA Services

JD Tech Talk

Aug 16, 2024 · Operations

Designing Cost‑Effective Disaster Recovery Data Backup for LBS‑Based SOA Services

This article details a comprehensive disaster‑recovery strategy for LBS‑driven SOA services, covering challenges of massive POI data backup, cost‑reduction via grid indexing (H3), selective caching, compression, diff validation, client‑side fallback, and deployment processes to achieve reliable, low‑cost data availability.

LBScost optimizationdata backup

0 likes · 19 min read

Designing Cost‑Effective Disaster Recovery Data Backup for LBS‑Based SOA Services

JD Cloud Developers

Aug 16, 2024 · Backend Development

How to Slash LBS Disaster‑Recovery Costs by 99% with Smart Grid Backups

This article explores a comprehensive disaster‑recovery strategy for LBS‑driven SOA services, detailing challenges of massive POI data backup, cost‑reduction techniques using grid‑based caching, H3 hexagonal indexing, selective data compression, and a hybrid client‑server fallback mechanism to achieve high availability at dramatically lower expense.

LBSSOAdata backup

0 likes · 21 min read

How to Slash LBS Disaster‑Recovery Costs by 99% with Smart Grid Backups

21CTO

Jul 23, 2024 · Information Security

What the Microsoft Blue‑Screen Crisis Teaches About IT Risk Management

The massive Microsoft blue‑screen outage caused by a faulty CrowdStrike update highlights the dangers of single‑system reliance, poor code quality, insufficient QA, and the need for staged rollouts, robust backup, real‑time monitoring, and proactive incident‑response strategies for modern IT organizations.

IT OperationsIncident ResponseMonitoring

0 likes · 10 min read

What the Microsoft Blue‑Screen Crisis Teaches About IT Risk Management

MaGe Linux Operations

Jul 11, 2024 · Operations

Mastering Velero: Backup and Restore OpenShift Clusters to Alibaba Cloud

This guide explains how to install Velero, configure Alibaba Cloud OSS credentials, create backup storage locations, perform manual and scheduled backups, restore clusters, use hooks, debug operations, expose Prometheus metrics, and handle disaster recovery for OpenShift environments.

Alibaba CloudCLIKubernetes backup

0 likes · 11 min read

Mastering Velero: Backup and Restore OpenShift Clusters to Alibaba Cloud

Efficient Ops

Jul 7, 2024 · Operations

Boost Business Continuity and IT System Stability: Practical Strategies

This article explains business continuity concepts, outlines the risks to IT system stability, and provides actionable steps—such as expanding monitoring coverage, improving fault detection, enhancing architecture resilience, and strengthening emergency coordination—to ensure continuous operation despite inevitable failures.

Monitoringbusiness continuitydisaster recovery

0 likes · 7 min read

Boost Business Continuity and IT System Stability: Practical Strategies

dbaplus Community

Jun 20, 2024 · Databases

Meituan’s Scalable Database Disaster Recovery: Architecture, Practices & Future

This article explains Meituan's multi‑stage disaster‑recovery strategy for databases, detailing the evolution from single‑active to N+1 and unit‑based architectures, the challenges of ultra‑large clusters, the DDTP platform's capabilities, and future plans to automate and extend resilience across regions.

MeituanN+1 architecturedatabase high availability

0 likes · 19 min read

Meituan’s Scalable Database Disaster Recovery: Architecture, Practices & Future

iQIYI Technical Product Team

May 24, 2024 · Operations

High Availability and Disaster Recovery Practices of iQIYI's Video Relay Service (VRS)

iQIYI’s Video Relay Service ensures uninterrupted video playback by employing a two‑region, three‑center hybrid cloud architecture, multi‑layer storage, cross‑AZ retry mechanisms, protective rate‑limiting and degradation paths, layered monitoring, and rigorous stress‑testing and chaos engineering to achieve high availability and disaster recovery.

Backend ArchitectureCloud NativeMonitoring

0 likes · 18 min read

High Availability and Disaster Recovery Practices of iQIYI's Video Relay Service (VRS)

Open Source Linux

Mar 27, 2024 · Operations

Why RPO & RTO Matter: Cloud Disaster Recovery in China’s Expanding Market

The article examines how recent Chinese cybersecurity regulations have heightened the importance of business continuity, explains key disaster‑recovery metrics RPO and RTO, outlines data‑, application‑, and business‑level backup tiers, analyzes the cloud‑based disaster‑recovery market growth, and highlights sector‑specific demands such as in healthcare.

RPORTObusiness continuity

0 likes · 7 min read

Why RPO & RTO Matter: Cloud Disaster Recovery in China’s Expanding Market

Architects' Tech Alliance

Mar 13, 2024 · Industry Insights

Why China’s Disaster Recovery Market Is Booming: Trends, Levels, and Cloud Backup Insights

The 2023 overview of China’s disaster recovery and backup industry reveals how new cybersecurity regulations, rising RPO/RTO expectations, a three‑tier protection model, and the shift to cloud‑based solutions are driving rapid market growth across sectors such as healthcare, while competition remains fragmented among many vendors.

China MarketCloud BackupRPO

0 likes · 8 min read

Why China’s Disaster Recovery Market Is Booming: Trends, Levels, and Cloud Backup Insights

Baidu Geek Talk

Mar 4, 2024 · Databases

Bank Core System Transformation and GaiaDB-X Distributed Database Solutions for Financial Scenarios

To meet exploding transaction volumes, rapid innovation cycles, and strict regulatory demands, large banks are replacing mainframe core systems with distributed, horizontally‑scalable architectures, and Baidu’s GaiaDB‑X database—offering strong ACID consistency, zero‑RPO disaster recovery, and automated operations—has successfully powered core banking migrations for institutions such as Bank of China and state‑owned banks.

GaiaDB-XTSO consistencybank core system

0 likes · 26 min read

Bank Core System Transformation and GaiaDB-X Distributed Database Solutions for Financial Scenarios

Open Source Linux

Mar 1, 2024 · Operations

How Two‑Site Three‑Center Disaster Recovery Boosts Business Continuity with Oracle Data Guard

The two‑site three‑center disaster recovery model combines a production site, a same‑city backup, and a remote backup to ensure data integrity and rapid recovery, leveraging Oracle Data Guard for synchronized and asynchronous replication, thereby improving RPO and RTO across various disaster scenarios.

OperationsOracle Data Guardbusiness continuity

0 likes · 4 min read

How Two‑Site Three‑Center Disaster Recovery Boosts Business Continuity with Oracle Data Guard

Architects' Tech Alliance

Feb 24, 2024 · Operations

How the Two‑Site Three‑Center Disaster Recovery Model Boosts Business Continuity

The article explains the two‑site three‑center disaster‑recovery architecture—comprising a production site, a same‑city backup, and a remote backup—detailing synchronous and asynchronous data replication, failover capabilities, Oracle Data Guard implementation, and why this hybrid approach delivers superior RPO, RTO, and availability for enterprises.

InfrastructureOracle Data GuardRPO

0 likes · 6 min read

How the Two‑Site Three‑Center Disaster Recovery Model Boosts Business Continuity

MaGe Linux Operations

Feb 21, 2024 · Databases

How to Deploy Fine-Grained Disaster Recovery for GaussDB DWS on Cloud

This guide explains step‑by‑step how to manually set up fine‑grained disaster recovery for GaussDB(DWS) in a cloud‑based dual‑cluster environment, covering preparation, configuration files, backup/restore operations, verification, and removal to achieve reliable, cost‑effective data protection.

GaussDBSQLcloud

0 likes · 20 min read

How to Deploy Fine-Grained Disaster Recovery for GaussDB DWS on Cloud

Cloud Native Technology Community

Feb 2, 2024 · Cloud Native

Achieving Sub‑2‑Hour RTO: A Cloud‑Native Disaster Recovery Blueprint for Enterprises

This article examines how a leading global industrial group leveraged a cloud‑native platform to design a disaster‑recovery solution that meets a sub‑2‑hour RTO and a 1‑minute RPO, detailing architecture, data‑layer strategies, middleware replication, application and access‑layer handling, and operational best practices.

ACPCloud NativeGitOps

0 likes · 17 min read

Achieving Sub‑2‑Hour RTO: A Cloud‑Native Disaster Recovery Blueprint for Enterprises

Efficient Ops

Feb 1, 2024 · Operations

How Tencent’s Public Gateway Overcomes Extreme Availability Challenges

The article details Tencent's Public Gateway (TGW) architecture, its forwarding and control planes, and presents two real‑world extreme failure cases— a NIC batch bug and a special IPv6 packet causing core dumps—along with the multi‑level disaster‑recovery design and mitigation strategies employed to ensure high availability.

AvailabilityTencent Clouddisaster recovery

0 likes · 8 min read

How Tencent’s Public Gateway Overcomes Extreme Availability Challenges

dbaplus Community

Dec 10, 2023 · Operations

11 Hard‑Earned Lessons from Two Decades of Google Site Reliability Engineering

Drawing on twenty years of Google SRE experience, this article outlines eleven practical lessons—from scaling mitigation to disaster‑resilience testing—that help teams design, operate, and evolve reliable large‑scale services.

Incident ResponseSREcanary releases

0 likes · 12 min read

11 Hard‑Earned Lessons from Two Decades of Google Site Reliability Engineering

Tencent Cloud Developer

Nov 30, 2023 · Cloud Computing

X's Cloud Cost Reduction and the Shift Toward On‑Premises: Implications for Cloud Computing Trends

X (formerly Twitter) cut monthly cloud spending by 60% by shifting workloads and storage to on‑premises infrastructure, igniting a debate over whether de‑clouding is viable for all enterprises, how it signals a potential inflection point in cloud computing, and what strategies—balancing high availability, disaster recovery, and cost efficiency—should guide firms, as highlighted in the upcoming TVP Tech Sleepless Nights series featuring leading industry experts.

Cloud Nativecloud repatriationcost optimization

0 likes · 7 min read

X's Cloud Cost Reduction and the Shift Toward On‑Premises: Implications for Cloud Computing Trends

Top Architecture Tech Stack

Nov 27, 2023 · Operations

Designing Multi-Active Cross‑Region Architecture: Scenarios, Patterns, and Practical Techniques

This article explains the motivations, application scenarios, architectural patterns (same‑city, cross‑city, and cross‑country), and concrete design techniques for building multi‑active cross‑region systems that ensure high availability and graceful degradation during extreme failures.

Distributed Systemsdata synchronizationdisaster recovery

0 likes · 32 min read

Designing Multi-Active Cross‑Region Architecture: Scenarios, Patterns, and Practical Techniques

Top Architecture Tech Stack

Nov 22, 2023 · Operations

Designing Multi‑Active (Active‑Active) Architecture Across Regions: Scenarios, Patterns, and Practical Techniques

This article explains the motivations, application scenarios, architectural patterns, and step‑by‑step design techniques for building geographically distributed active‑active systems that can survive extreme failures while balancing cost, complexity, and data consistency requirements.

Active-ActiveDistributed SystemsSystem Design

0 likes · 32 min read

Designing Multi‑Active (Active‑Active) Architecture Across Regions: Scenarios, Patterns, and Practical Techniques

Aikesheng Open Source Community

Oct 31, 2023 · Databases

MySQL Disaster Recovery: Multi‑Region Three‑Center Replication and RTO/RPO Optimization

This article explains the principles of disaster recovery for MySQL, covering RTO/RPO metrics, national backup level standards, common master‑slave topologies, a comparative analysis of high‑availability solutions, and a detailed three‑center multi‑region replication design with code patches to avoid replication loops.

DatabaseRPORTO

0 likes · 17 min read

MySQL Disaster Recovery: Multi‑Region Three‑Center Replication and RTO/RPO Optimization

Architecture and Beyond

Oct 29, 2023 · Operations

Postmortem of the October 23 Yuque Service Outage: Lessons on Complex Systems and the KISS Principle

The October 23 Yuque outage, caused by a buggy upgrade tool and outdated storage hardware, highlighted the importance of thorough testing, robust disaster‑recovery, high‑availability architecture, clear communication, continuous learning, and applying the KISS principle to simplify complex systems and improve operational stability.

Complex SystemsIncident ManagementKISS principle

0 likes · 10 min read

Postmortem of the October 23 Yuque Service Outage: Lessons on Complex Systems and the KISS Principle

Su San Talks Tech

Oct 27, 2023 · Operations

What We Learned from Yuque’s October 23 Outage: A Detailed Incident Review

This article walks through Yuque’s October 23 service disruption, detailing each timeline milestone, analyzing the root causes, highlighting the importance of monitoring and data integrity checks, and offering concrete post‑mortem recommendations to improve future incident handling.

Cloud ServicesIncident ResponseMonitoring

0 likes · 12 min read

What We Learned from Yuque’s October 23 Outage: A Detailed Incident Review

Architect

Oct 19, 2023 · Industry Insights

How Vivo Built a Highly Available Push System: Multi‑Region Architecture, Real‑Time Traffic Scheduling, and Disaster‑Recovery Strategies

This article analyzes the design of Vivo's push notification platform, detailing its high‑concurrency requirements, three‑region long‑connection deployment, traffic‑scheduling bypass layer, and layered storage disaster‑recovery solutions, while explaining the trade‑offs and performance metrics behind each architectural decision.

Cloud NativeKafkaRedis

0 likes · 14 min read

How Vivo Built a Highly Available Push System: Multi‑Region Architecture, Real‑Time Traffic Scheduling, and Disaster‑Recovery Strategies

Tencent Cloud Developer

Sep 20, 2023 · Operations

Storage Governance and Optimization Practices for Meeting Control Systems

The article explains how a meeting control system tackled severe storage pressure from high concurrent traffic by introducing a proxy layer, multi‑active disaster‑recovery, identity‑based data isolation, dynamic‑static key separation, multi‑level caching, overload protection, sharding with dual‑write migration, and extensive monitoring to meet 100k QPS and ensure reliability.

RedisStorage Optimizationdatabase sharding

0 likes · 49 min read

Storage Governance and Optimization Practices for Meeting Control Systems

DataFunSummit

Sep 11, 2023 · Big Data

eBay's Cloud‑Native Kafka Big Data Platform: Disaster Recovery and High‑Availability Practices

This article details eBay's implementation of a cloud‑native Kafka platform on Kubernetes, covering operational challenges, K8s Operator deployment, single‑ and multi‑data‑center high‑availability designs, anti‑affinity strategies, automated failover components, and future work on remote storage for Kafka.

Big DataKafkaKubernetes

0 likes · 14 min read

eBay's Cloud‑Native Kafka Big Data Platform: Disaster Recovery and High‑Availability Practices

Tech Architecture Stories

Aug 15, 2023 · Cloud Native

Unlocking Microservice Success: The Interplay of Metrics, Governance, and Validation

This article explains how measurement (SLI/SLO), governance (architecture refactoring, MTTx), and validation (chaos engineering, disaster drills) interrelate in microservice systems, illustrating how observability drives governance actions, governance improves metrics, and validation reinforces both through continuous testing.

MicroservicesObservabilitySLI

0 likes · 4 min read

Unlocking Microservice Success: The Interplay of Metrics, Governance, and Validation

Baidu Geek Talk

Jul 21, 2023 · Cloud Native

Baidu Zhidao Cloud Migration Practice: From Legacy OXP to Cloud-Native Architecture

Baidu Zhidao migrated its 18‑year‑old Q&A platform from a legacy OXP architecture to a cloud‑native solution using Pandora containers and the Zhiyun platform, overcoming complex code, high traffic, and zero‑downtime requirements, and achieved full traffic migration, 99.99% SLA, reduced latency, and enhanced elasticity and multi‑region disaster recovery.

Infrastructure EvolutionPaaScloud migration

0 likes · 13 min read

Baidu Zhidao Cloud Migration Practice: From Legacy OXP to Cloud-Native Architecture

vivo Internet Technology

Apr 26, 2023 · Operations

Disaster Recovery Design and Practices for Vivo Push System

Vivo’s push platform achieves high‑availability disaster recovery by deploying multi‑region broker clusters, implementing dual‑active logic nodes across two data centers, adding a Kafka‑backed buffering layer for traffic spikes, and using a hybrid Redis‑plus‑disk KV storage scheme to ensure durable, real‑time message delivery.

KafkaPush SystemRedis

0 likes · 11 min read

Disaster Recovery Design and Practices for Vivo Push System

ITPUB

Apr 10, 2023 · Backend Development

How Bilibili Scales Its Like Service: Architecture, Storage, and Disaster Recovery

This article details Bilibili's thumb‑up system design, covering business capabilities, multi‑layer storage, traffic handling, disaster‑recovery strategies, and future plans to ensure a high‑traffic, reliable like service for videos, posts, comments, and more.

backenddisaster recoverystorage

0 likes · 15 min read

How Bilibili Scales Its Like Service: Architecture, Storage, and Disaster Recovery

Programmer DD

Mar 16, 2023 · Operations

Why High Availability Matters: Building Fault‑Tolerant Cloud Systems

The article explains how system failures like bugs, security breaches, and cloud outages can cripple businesses, and outlines the concepts of fault tolerance and disaster recovery as essential components of high‑availability architectures to ensure continuous service and protect revenue.

disaster recoveryfault tolerancehigh availability

0 likes · 7 min read

Why High Availability Matters: Building Fault‑Tolerant Cloud Systems

ITPUB

Mar 14, 2023 · Big Data

How to Build Real-Time Active‑Active Disaster Recovery for OLAP MPP Clusters

This article explains why disaster‑recovery and active‑active architectures are essential for OLAP MPP data‑warehouse clusters, outlines the specific RPO/RTO requirements for batch and real‑time workloads, and compares several data‑synchronization techniques and active‑active deployment models with their advantages and drawbacks.

Active-ActiveMPPOLAP

0 likes · 12 min read

How to Build Real-Time Active‑Active Disaster Recovery for OLAP MPP Clusters

DaTaobao Tech

Feb 27, 2023 · Cloud Computing

Design of a Generic Backup and Disaster Recovery Solution

The proposed generic backup and disaster‑recovery framework introduces three business layers—backup data, DR retrieval, and dirty‑data cleanup—supporting both manual and scheduled backups, automatic or manual cleanup, pagination, bucketed delivery, sharding, and customizable filter chains to prevent large‑scale inconsistencies during failures.

Bucketarchitecturebackup

0 likes · 7 min read

Design of a Generic Backup and Disaster Recovery Solution

ITPUB

Jan 12, 2023 · Operations

How to Build a Truly High‑Availability System: 6 Essential Design Layers

This article breaks down the essential design and operational considerations for achieving high availability across six layers—development standards, application services, storage, product strategy, operations deployment, and incident response—providing concrete practices, metrics, and safeguards to reach four‑nine (99.99%) uptime.

OperationsSystem Designcapacity planning

0 likes · 25 min read

How to Build a Truly High‑Availability System: 6 Essential Design Layers

DaTaobao Tech

Jan 11, 2023 · Game Development

Game Quality Assurance and Testing in Taobao Mini Program Ecosystem

The article examines the unique QA and testing challenges of Taobao’s Mini Program game ecosystem—such as incomplete container tools, lacking engine standards, and diverse usage patterns—and proposes solutions including a dedicated testing infrastructure, performance standards, automated integration, proactive monitoring, fault simulation, and future enhancements.

FCanvas renderingMobile Developmentdisaster recovery

0 likes · 20 min read

Game Quality Assurance and Testing in Taobao Mini Program Ecosystem

Architects' Tech Alliance

Jan 6, 2023 · Operations

Fundamentals of Data Replication, Backup, and Disaster Recovery

This article explains the core concepts of data replication, backup strategies, and disaster recovery, covering RTO/RPO metrics, backup types, copy data management, and the differences between data‑level, application‑level, and business‑level disaster recovery solutions.

Cloud BackupRPORTO

0 likes · 14 min read

Fundamentals of Data Replication, Backup, and Disaster Recovery

Top Architect

Dec 29, 2022 · Backend Development

High‑Availability Strategies for Stateful Backend Services: Cold Backup, Dual‑Machine Active/Standby, Same‑City and Cross‑City Active‑Active, and Multi‑Active Architectures

The article explains various high‑availability solutions for stateful backend services, comparing cold backup, dual‑machine active/standby, same‑city active‑active, cross‑city active‑active, and cross‑city multi‑active approaches, and discusses their trade‑offs, implementation details, and real‑world examples from large internet companies.

Backend Architectureactive standbycold backup

0 likes · 16 min read

High‑Availability Strategies for Stateful Backend Services: Cold Backup, Dual‑Machine Active/Standby, Same‑City and Cross‑City Active‑Active, and Multi‑Active Architectures

MaGe Linux Operations

Dec 21, 2022 · Operations

Essential Guide to IT Disaster Recovery: 12 Critical Elements Every Business Needs

This article explains what constitutes an IT disaster, outlines the three main disaster types, defines disaster recovery, and details the twelve essential components of a comprehensive disaster recovery plan to help organizations maintain continuity and protect critical assets.

IT OperationsIT infrastructurebusiness continuity

0 likes · 10 min read

Essential Guide to IT Disaster Recovery: 12 Critical Elements Every Business Needs

Architects' Tech Alliance

Nov 5, 2022 · Databases

Data Replication: Fundamentals, Technologies, and Future Trends

This article explains the concept of data replication, its three-stage process, key principles of compliance, timeliness, and diversity, various replication methods, layered technologies across storage, operating system, and database levels, emerging cloud and big‑data solutions, and heterogeneous use‑case scenarios.

Big Datadata replicationdatabases

0 likes · 15 min read

Data Replication: Fundamentals, Technologies, and Future Trends

dbaplus Community

Oct 15, 2022 · Backend Development

Why Unitized Architecture Is the Key to Scalable Financial IT Systems

The article explains why large financial institutions need a unitized (Set‑based) architecture to improve resource utilization, achieve city‑level disaster recovery, and support massive user traffic, then defines the concept and outlines a four‑step process for building such a system.

data shardingdisaster recoveryfinancial IT

0 likes · 8 min read

Why Unitized Architecture Is the Key to Scalable Financial IT Systems

Programmer DD

Oct 11, 2022 · Operations

How to Achieve High Availability for Stateful Backend Services?

This article explores various high‑availability strategies for stateful backend services, comparing cold backup, active/standby, same‑city active‑active, and multi‑site active‑active solutions, discussing their benefits, limitations, and practical implementation examples from large‑scale internet companies.

Active-ActiveBackend Architecturedisaster recovery

0 likes · 17 min read

How to Achieve High Availability for Stateful Backend Services?

Open Source Linux

Oct 8, 2022 · Operations

Mastering High Availability: From Cold Backups to Multi‑Region Active‑Active Architectures

This article examines high‑availability strategies for stateful backend services, covering cold backups, active‑standby, same‑city and cross‑city active‑active, and multi‑active designs, while discussing their trade‑offs, implementation details, and real‑world enterprise examples.

Active-Activeactive standbycold backup

0 likes · 15 min read

Mastering High Availability: From Cold Backups to Multi‑Region Active‑Active Architectures

Efficient Ops

Sep 29, 2022 · Operations

Mastering High Availability: From Cold Backups to Multi‑Active Disaster Recovery

This article explores the evolution of high‑availability strategies for stateful backend services, comparing cold backups, active/standby, same‑city and cross‑city active‑active setups, and discusses the trade‑offs, design considerations, and real‑world implementations of multi‑active and multi‑active architectures.

Active-ActiveBackend Architectureactive standby

0 likes · 16 min read

Mastering High Availability: From Cold Backups to Multi‑Active Disaster Recovery

DeWu Technology

Sep 26, 2022 · Cloud Native

DeWu's High‑Availability Architecture Evolution

DeWu’s tech team describes how their e‑commerce platform grew from a simple PHP monolith to a containerized active‑active, multi‑region system with hot‑standby failover, comprehensive governance, full‑link stress testing, and detailed big‑sale preparation, illustrating a systematic, evolving high‑availability architecture that balances scalability, disaster recovery, and business continuity.

MicroservicesSystem Architecturedisaster recovery

0 likes · 21 min read

DeWu's High‑Availability Architecture Evolution

Architects' Tech Alliance

Sep 19, 2022 · Operations

Fundamentals of Data Replication, Backup, and Disaster Recovery

This article explains the core concepts of disaster recovery and data backup—including RTO, RPO, recovery levels, cloud disaster recovery, backup types, copy data management, deduplication, compression, and block/file/database backup—while also noting related commercial offerings.

Copy Data ManagementRPORTO

0 likes · 13 min read

Alibaba Cloud Developer

Sep 14, 2022 · Operations

Mastering System Stability: From Fault Prevention to Emergency Response

This article outlines a comprehensive safety‑production framework that covers pre‑incident fault prevention, incident response, and post‑mortem improvement, detailing design‑for‑failure principles such as redundancy, isolation, idempotence, monitoring, automation, disaster recovery, scaling, rate‑limiting, and continuous testing to ensure reliable, resilient services.

Incident ManagementMonitoringReliability

0 likes · 16 min read

Mastering System Stability: From Fault Prevention to Emergency Response

Open Source Linux

Sep 7, 2022 · Operations

Understanding Hot, Cold, and Active‑Active Data Center Strategies: Benefits and Challenges

This article explains the three main data‑center redundancy models—hot standby, cold standby, and active‑active—detailing how each works, their advantages and drawbacks, and the key requirements for implementing a truly resilient dual‑site infrastructure.

Active-ActiveData Centercold standby

0 likes · 13 min read

Understanding Hot, Cold, and Active‑Active Data Center Strategies: Benefits and Challenges

Architects' Tech Alliance

Aug 28, 2022 · Databases

Data Replication: Fundamentals, Technologies, and Industry Trends

The article explains data replication concepts, processes, and technologies across storage hardware, operating system, and database layers, outlines synchronous, asynchronous, and hybrid methods, discusses industry applications, trends such as hardware‑software decoupling, cloud replication, and big‑data real‑time copying, and highlights challenges and future directions.

Big Dataclouddata replication

0 likes · 14 min read

Data Replication: Fundamentals, Technologies, and Industry Trends

Baidu Geek Talk

Aug 17, 2022 · Industry Insights

How Baidu Cloud Storage Solves the Four Big Challenges of the ABC Era

This article examines the massive data, cost, stability, and diversity challenges of the AI‑driven, big‑data, cloud‑first "ABC" era and explains how Baidu's Canghai storage portfolio—including BOS, CDS, CFS, PFS, RapidFS, CloudFlow, and storage gateways—addresses each issue through scalable architecture, tiered lifecycle policies, multi‑AZ disaster recovery, and integrated hybrid‑cloud solutions.

BaiduData MigrationIndustry Insights

0 likes · 16 min read

How Baidu Cloud Storage Solves the Four Big Challenges of the ABC Era

Baidu Intelligent Cloud Tech Hub

Aug 15, 2022 · Cloud Computing

How Baidu’s Canghai Storage Tackles Massive Data Challenges in the Cloud

This article outlines the four major storage challenges of the ABC era—massive scale, cost efficiency, stability, and diversity—and explains how Baidu’s Canghai storage suite, including BOS, CDS, CFS, PFS, RapidFS, CloudFlow, and storage gateways, addresses each through multi‑cloud migration, tiered lifecycle management, and robust disaster‑recovery solutions.

AIBig DataData Migration

0 likes · 15 min read

How Baidu’s Canghai Storage Tackles Massive Data Challenges in the Cloud

DaTaobao Tech

Aug 15, 2022 · Cloud Native

Reflections on CAP Theory, ACID, BASE, and Cloud‑Native Fault Tolerance

Reflecting on reading, the author reviews CAP theory’s consistency‑availability‑partition trade‑offs, extends ACID and BASE concepts, proposes modernizing CAP objects to consistency, fault and disaster tolerance, and examines how cloud‑native architectures, micro‑services, and SLA‑driven designs reshape fault tolerance and future self‑healing systems.

ACIDBASECAP theorem

0 likes · 21 min read

Reflections on CAP Theory, ACID, BASE, and Cloud‑Native Fault Tolerance

DataFunSummit

Aug 12, 2022 · Big Data

JD's Big Data Cross‑Domain and Hierarchical Storage Practices

JD’s article details its big‑data platform’s cross‑domain and hierarchical storage solutions, describing the challenges of multi‑datacenter data synchronization, the architecture of its storage layer, the implemented asynchronous and synchronous data flows, topology management, metadata tagging, and performance‑enhancing techniques for efficient, disaster‑resilient data handling.

Data PlatformHierarchical StorageMetadata Management

0 likes · 11 min read

JD's Big Data Cross‑Domain and Hierarchical Storage Practices

Architects' Tech Alliance

Aug 2, 2022 · Databases

China Database Market 2022: Rankings, Growth Trends and Future Outlook

The July 2022 China Database Popularity Ranking lists 232 databases with DM moving up to second place, while IDC reports a $15.8 billion market in H2 2021 growing 34.9% year‑over‑year and predicts the sector will reach $95.5 billion by 2026, accompanied by a collection of monthly analysis PDFs and extensive resources on active‑active disaster‑recovery solutions.

ChinaDatabase MarketDatabase Rankings

0 likes · 6 min read

China Database Market 2022: Rankings, Growth Trends and Future Outlook

Architects' Tech Alliance

Jul 30, 2022 · Operations

Analysis of Arbitration and Two‑Site‑Three‑Center (3DC) Solutions in Dual‑Active Data Center Disaster Recovery

This article examines key arbitration mechanisms and the two‑site‑three‑center (3DC) extension model for dual‑active data‑center disaster‑recovery, comparing implementations from Huawei, EMC, IBM, HDS and NetApp, and discusses design considerations, risks of brain‑split, and best‑practice deployment options.

3DCDual-Activearbitration

0 likes · 32 min read

Analysis of Arbitration and Two‑Site‑Three‑Center (3DC) Solutions in Dual‑Active Data Center Disaster Recovery

ByteDance SE Lab

Jun 28, 2022 · Backend Development

Douyin's Video Red Packet System: Architecture, Scaling Challenges & Solutions

During the Chinese New Year campaign, Douyin integrated video creation with red packet gifting, supporting both B2C and C2C flows; this article details the system’s core operations, modular design, high‑traffic subsidy handling, concurrency strategies, fault tolerance, security measures, and comprehensive performance testing.

Backend ArchitectureScalabilitySystem Design

0 likes · 24 min read

Douyin's Video Red Packet System: Architecture, Scaling Challenges & Solutions

Efficient Ops

Jun 22, 2022 · Operations

How Major Banks Design Disaster‑Recovery Architecture for Uninterrupted Service

This article examines banking regulatory requirements and typical disaster‑recovery architectures, explains system tiering and recovery‑time objectives, and shares the Industrial and Commercial Bank of China's evolution from a two‑site, two‑center model to a cloud‑native, multi‑center disaster‑recovery framework, offering practical design insights.

architecturebusiness continuitydisaster recovery

0 likes · 14 min read

How Major Banks Design Disaster‑Recovery Architecture for Uninterrupted Service

Baidu Geek Talk

Jun 13, 2022 · Backend Development

Baidu Comment Middle Platform: Architecture Design and Implementation

Baidu's Comment Middle Platform evolved from a single service into a robust middleware that delivers stable, high‑performance comment functionality across more than twenty products, handling hundreds of millions of daily requests with 99.995% SLA through graph‑based scheduling, tiered caching, and scalable sorting mechanisms.

BaiduComment SystemGraph Scheduling

0 likes · 17 min read

Baidu Comment Middle Platform: Architecture Design and Implementation

Aikesheng Open Source Community

Apr 20, 2022 · Databases

Building and Using MySQL InnoDB Cluster Set (MICS) for Disaster Recovery

This article explains the components of MySQL InnoDB Cluster, introduces the InnoDB Cluster Set (MICS) for disaster‑recovery, outlines its limitations, and provides a step‑by‑step demonstration with code on how to create, monitor, and fail over a MICS deployment.

ClusterSetInnoDB Clusterdisaster recovery

0 likes · 10 min read

Building and Using MySQL InnoDB Cluster Set (MICS) for Disaster Recovery

Architects' Tech Alliance

Apr 2, 2022 · Industry Insights

How Financial Institutions Secure Database Continuity: Disaster Recovery Strategies & Market Trends

This article examines the critical role of databases in finance, defines disaster recovery and backup concepts, outlines industry requirements and regulations, analyzes market growth, and compares distributed database disaster‑recovery architectures such as single‑center, city‑level mutual backup, active‑active, and two‑site three‑center solutions.

DatabaseDistributed SystemsFinancial Services

0 likes · 15 min read

How Financial Institutions Secure Database Continuity: Disaster Recovery Strategies & Market Trends

MaGe Linux Operations

Mar 24, 2022 · Operations

Understanding Disaster Tolerance vs. Backup: Key Differences and Planning Strategies

This article explains the concepts of disaster tolerance, fault tolerance, and disaster recovery, compares them with backup purposes, discusses RTO/RPO metrics, investment considerations, and outlines common disaster‑recovery architectures for enterprise IT operations.

IT OperationsRPORTO

0 likes · 8 min read

Understanding Disaster Tolerance vs. Backup: Key Differences and Planning Strategies

Sanyou's Java Diary

Mar 20, 2022 · Operations

Unlocking Ultra‑High Availability: The Secrets of Geo‑Active Multi‑Active Architecture

This article explains what geo‑active multi‑active (异地多活) architecture is, why it is needed for ultra‑high availability, and walks through the step‑by‑step evolution from a single‑node system to sophisticated multi‑data‑center designs that use redundancy, disaster‑recovery, data synchronization, routing, and conflict‑resolution techniques.

data replicationdisaster recoverymulti-active

0 likes · 31 min read

Unlocking Ultra‑High Availability: The Secrets of Geo‑Active Multi‑Active Architecture

Ops Development Stories

Mar 10, 2022 · Operations

Mastering Distributed High Availability: From Single‑Node to Multi‑Active Architecture

This comprehensive guide explains why modern software systems need geo‑distributed multi‑active architectures, walks through the evolution from basic single‑node setups to master‑slave replication, same‑city disaster recovery, dual‑active, two‑city three‑center, and true multi‑active designs, and highlights the key principles, risks, and implementation strategies for achieving ultra‑high availability.

Distributed SystemsSystem Designdisaster recovery

0 likes · 32 min read

Mastering Distributed High Availability: From Single‑Node to Multi‑Active Architecture

Alibaba Cloud Native

Feb 10, 2022 · Cloud Native

How Multi-Active Architecture Can Eliminate Downtime: Inside Alibaba Cloud’s AppActive

Despite widespread cloud adoption, large‑scale outages still occur, prompting Alibaba Cloud’s high‑availability team to share the evolution, principles, and open‑source implementation of multi‑active disaster recovery (AppActive) that aims to achieve minute‑level failover and near‑zero downtime.

Alibaba CloudAppActivedisaster recovery

0 likes · 11 min read

How Multi-Active Architecture Can Eliminate Downtime: Inside Alibaba Cloud’s AppActive

Architects' Tech Alliance

Jan 30, 2022 · Cloud Computing

Why Hybrid Cloud Is the Future: Key Scenarios and Challenges Explained

The article explains what hybrid cloud is, outlines five typical use cases such as load scaling, disaster recovery, data backup, application deployment, and dev‑test‑prod workflows, and discusses three major challenges including ecosystem innovation, unified multi‑cloud management, and cloud‑network collaboration.

Infrastructure ManagementScalabilitydisaster recovery

0 likes · 7 min read

Why Hybrid Cloud Is the Future: Key Scenarios and Challenges Explained

21CTO

Jan 21, 2022 · Frontend Development

How Meituan’s Phoenix SDK Enables Client‑Side CDN Disaster Recovery

This article explains Meituan's Phoenix solution that moves CDN disaster recovery to the client side, detailing its goals, architecture, dynamic calculation service, monitoring platform, implementation for web and native apps, and the measurable improvements in availability and operational efficiency.

CDNMeituanPhoenix SDK

0 likes · 18 min read

How Meituan’s Phoenix SDK Enables Client‑Side CDN Disaster Recovery

ITPUB

Jan 19, 2022 · Frontend Development

How Meituan’s Phoenix SDK Enables Automatic Client‑Side CDN Failover

Meituan’s Phoenix solution equips web and native clients with an automatic CDN failover SDK, dynamic domain selection, and fine‑grained monitoring, dramatically improving resource loading success rates, reducing SRE workload, and ensuring high availability across millions of daily users.

CDNFrontendMeituan

0 likes · 20 min read

How Meituan’s Phoenix SDK Enables Automatic Client‑Side CDN Failover

DeWu Technology

Jan 19, 2022 · Operations

Common High‑Availability Architecture Patterns and Multi‑Active Deployment Strategies

Covering essential high‑availability techniques, the article examines disaster‑recovery architectures from same‑city dual‑center to cross‑country active‑passive deployments, compares five patterns, details three multi‑active models, outlines required traffic‑scheduling, replication, and database layers, and provides design methodology, practical safeguards, and key HA metrics.

Distributed Systemsdata replicationdisaster recovery

0 likes · 23 min read

Common High‑Availability Architecture Patterns and Multi‑Active Deployment Strategies

Meituan Technology Team

Jan 13, 2022 · Operations

Phoenix: Client‑Side CDN Disaster Recovery Solution at Meituan

Phoenix is Meituan’s client‑side CDN disaster‑recovery system that uses a Webpack‑based SDK, dynamic calculation service, and monitoring platform to automatically detect load failures, switch domains, isolate problems, and continuously hot‑standby resources, boosting resource success rates from 99.7 % to 99.9 % across hundreds of projects.

CDNMeituanPerformance

0 likes · 16 min read

Phoenix: Client‑Side CDN Disaster Recovery Solution at Meituan

MaGe Linux Operations

Jan 6, 2022 · Cloud Native

How to Build Minute‑Level Hybrid Cloud Disaster Recovery with MSHA Multi‑Active Architecture

This article presents a step‑by‑step guide for constructing a hybrid cloud disaster‑recovery solution using MSHA's multi‑active architecture, covering business background, design challenges, dual‑active deployment, traffic routing, data synchronization, one‑click failover, and validation of sub‑minute RPO/RTO for e‑commerce platforms.

Alibaba CloudMSHAcloud-native

0 likes · 14 min read

How to Build Minute‑Level Hybrid Cloud Disaster Recovery with MSHA Multi‑Active Architecture

Open Source Linux

Jan 6, 2022 · Operations

Disaster Recovery Explained: Definitions, Strategies, and Implementation

This article provides a comprehensive guide to disaster recovery, covering its definition, the distinction between backup and DR, various protection strategies, measurement metrics such as RPO and RTO, and practical implementation methods across storage, cloud, and network layers.

Data ProtectionRPORTO

0 likes · 16 min read

Disaster Recovery Explained: Definitions, Strategies, and Implementation