How SF Express Transformed Its Database Operations: From Legacy to Open‑Source, Distributed, and Intelligent Ops
This talk details SF Express’s journey from heterogeneous legacy databases to standardized open‑source, distributed architectures and intelligent operations, covering standardization, migration to open‑source, scaling with Mycat, automated resource pooling, and the ThinkDB platform that drives proactive, automated DBA workflows.
Preface
Today’s theme is “Carrying Heavy Loads Forward – The Path of Change in SF Express’s Database Operations”. The company’s rapid growth and diversification (express, cold‑chain, warehousing, finance) have multiplied data instances, demanding a series of technical transformations.
1. From Non‑Standard to Standard
1.1 The chaotic early years
Multiple database types fragmented DBA work and ignored user needs.
1.2 Moving toward standards
We reduced DB varieties, chose a commercial DB as the standard, unified HA and disaster‑recovery, monitored infrastructure health, and built a closed‑loop proactive prevention process that freed DBAs for higher‑value tasks.
First, we cut DB types and selected the strongest commercial DB as the standard.
Second, we unified HA and disaster‑recovery to avoid custom solutions causing operational errors.
Third, we monitored basic resources (storage, network, hosts) and addressed issues promptly.
Fourth, we established a proactive prevention loop, improving stability and allowing DBAs to focus on meaningful work.
2. From Traditional to Open‑Source
2.1 De‑commercializing
Top‑down decision to adopt open‑source; pilot on the core warehouse system, relying mainly on internal talent rather than external MySQL experts.
Key database fundamentals: transaction control and indexing.
We kept business logic out of the DB, used vertical data source splitting, and employed SAN for HA while developing a non‑SAN automatic failover solution.
2.2 Operations‑oriented development
We built a dedicated ops‑dev team, blending experienced engineers with fresh graduates, aligning ops rigor with dev creativity, and establishing separate but collaborative teams for configuration, monitoring, capacity, hardware‑software labs, interaction, disaster recovery, and restoration.
3. From Centralized to Distributed
3.1 Scaling limits of single instances
MySQL TPS caps around 5 000 in production; vertical splitting reached limits, leading to regional data partitioning and increased operational complexity.
3.2 Mycat as a proxy
After extensive testing, we adopted Mycat as the middleware, extending it with an SQL firewall, large‑data aggregation, and performance tweaks; the order‑processing system now handles >200 k TPS during peaks.
3.3 Large‑data aggregation
Moved massive sorting from heap to external storage, enabling aggregation of billions of rows without OOM.
3.4 SQL firewall
Embedded firewall enforces coding standards, blocks missing indexes, and captures problematic SQL in development environments.
4. Intelligent Operations
4.1 Collaborative platform
ThinkDB provides configuration discovery, real‑time monitoring, capacity forecasting, hardware‑software labs, self‑service portals, automated disaster‑recovery, and point‑in‑time restoration.
4.2 Automation of HA and resource pools
Using MGR, semi‑sync replication, and dual‑heartbeat checks, we achieve automatic failover; resource pools balance instances based on usage thresholds and peak forecasts.
Resource‑pool logic controls Pctfree/Pctused limits, automatically relocating instances when thresholds are exceeded and integrating capacity predictions for peak demand.
4.3 SQL quality control
Mycat’s firewall blocks bad SQL in dev, release pipelines include code review, and production scores drive continuous optimization across business systems.
4.4 Operational insights
By treating upstream issues as our own, we reduced annual incidents from 20 to zero over three years, showing that solid foundational ops enable higher‑level innovation.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.