Databases 15 min read

Qunar Database Backup and Recovery Platform: Design, Implementation, and Lessons Learned

This article details Qunar's comprehensive MySQL backup and recovery platform, covering high‑availability concepts, backup types, tool selection, multi‑channel architecture, storage solutions, automation, and operational lessons for scaling DBA operations across large‑scale deployments.

Qunar Tech Salon
Qunar Tech Salon
Qunar Tech Salon
Qunar Database Backup and Recovery Platform: Design, Implementation, and Lessons Learned

Speaker Xu Ziwen, former senior database engineer, introduces the critical role of backup and recovery in DBA daily operations and outlines the evolution of Qunar's backup‑recovery platform from both technical and business perspectives.

The discussion starts with high‑availability (HA) fundamentals, explaining HA levels (e.g., 5‑nine availability) and why backup is essential for achieving true HA.

Various MySQL HA solutions are mentioned (master‑slave replication, PXC, QMHA) alongside a taxonomy of backup methods such as cold/hot, logical/physical, local/remote, full/incremental, and LVM snapshot backups.

Backup types are compared: cold backups require downtime and have limitations, while hot (online) backups minimize impact and are the focus of the platform.

Logical backup tools (mysqldump, mysqldumper, mysqlpump) and physical backup tools (ibbackup, Percona Xtrabackup) are evaluated, highlighting the shift from slow single‑threaded logical backups to faster multi‑threaded physical backups.

Incremental backups are discussed, but Qunar ultimately adopts a full‑backup + binlog strategy to simplify management and enable remote backup via dedicated channels.

The architecture employs multiple backup channels per data center and uses a fault‑tolerant distributed file system (MFS) for high‑availability, scalable storage, with delayed deletion for safety.

Performance optimizations include parallel backup queues, dedicated backup NICs, and automatic read‑node selection to ensure backups run on non‑primary nodes even after failover.

Backup verification is performed by applying logs during the backup process, guaranteeing recoverability before marking a backup as successful.

All backup‑recovery functions are integrated into the internally developed DUBAI platform (Python/Shell backend, Tonardo frontend), providing a unified, visual management console.

Recovery scenarios supported include instance recreation, slave creation, point‑in‑time recovery, and flashback (via INCEPTION), with most use cases relying on full backups plus binlog application.

Remaining challenges such as long backup/recovery times are addressed through instance/table sharding, tiered service data archiving, and differentiated backup strategies based on service importance.

A data‑archiving tool moves cold data to secondary services (e.g., HBase), completing the tiered‑service model.

The article concludes with a demonstration of the fully automated backup‑recovery platform, showcasing its scalability and operational efficiency.

High AvailabilityMySQLBackupDBARecoveryPercona XtraBackup
Qunar Tech Salon
Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.