Databases 16 min read

Scaling Massive Data at Alibaba: From Oracle to MySQL Sharding and Distributed Solutions

The article details how Alibaba tackled explosive data growth by migrating from a single Oracle instance to a multi‑source architecture using horizontal and vertical sharding, MySQL clusters, KV stores, and proprietary tools such as Erosa, Otter, and Cobar to achieve high availability, consistency, and performance across multiple IDC sites.

Qunar Tech Salon
Qunar Tech Salon
Qunar Tech Salon
Scaling Massive Data at Alibaba: From Oracle to MySQL Sharding and Distributed Solutions

Challenges of Massive Data Growth Alibaba operates several sites (Chinese, International, Japan, etc.) with core tables reaching billions of rows, creating capacity, performance, and distributed‑system challenges. The original reliance on Oracle began to hit bottlenecks as data surged around 2007‑2008, prompting a split of databases.

Mitigation Strategies: Distribution and Storage The team first distributed load by horizontally sharding the largest tables from Oracle to MySQL clusters, later employing vertical splits as transitional steps. Different business data are stored using appropriate technologies: Oracle for core data, MySQL for high‑traffic tables, and KV stores for less relational attributes.

Data Center Processing Three internal products—Erosa, Eromanga, and Otter—handle real‑time bin‑log parsing, change data capture, and cross‑IDC synchronization. Erosa parses MySQL bin‑logs (and other KV stores) in real time, pushes changes to Eromanga for subscription, while Otter provides bidirectional data sync between IDC sites, similar to SharePlex.

System Cache Architecture Design Alibaba employs a three‑level cache hierarchy: front‑end image/page cache, back‑end object cache (local and distributed). Local cache loads immutable data at server start‑up, while remote distributed caches (e.g., Memcached, Berkeley DB) serve dynamic data. Cache granularity and TTL are tuned per business need to maximize hit rates.

Technical Department = Demolition Squad? Within Alibaba’s tech division, a team nicknamed the "Demolition Squad" focuses on breaking large Oracle tables into MySQL shards, using strategies such as field‑level, table‑level, and schema‑level splits.

Data Splitting Strategies Splits are performed via custom Cobar middleware, which acts as a MySQL proxy, routing queries based on defined sharding rules (by field, table, or schema). The migration follows a strict five‑step process, gradually moving data from Oracle to MySQL while minimizing service impact.

Cobar Product Details Cobar consists of Cobar Server (stand‑alone clusters) and Cobar Client (Java library). It includes a hand‑written SQL parser and routing engine, with plans to open‑source the client and parser to improve performance by several folds.

Backend Data Warehouse and Real‑time Requirements Alibaba’s data warehouse (DW) team uses Oracle RAC, Greenplum, and Cobar for offline and near‑real‑time analytics. Since deploying Erosa, data freshness improved from daily to sub‑five‑minute intervals. Future work aims to build a unified platform serving both OLTP (website) and OLAP (DW) workloads.

Alibabadistributed systemscachingMySQLdatabase shardingOracledata scaling
Qunar Tech Salon
Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.