Why Alibaba Chose Apache Flink: Architecture, Scale, and Future Directions
This article explains how Alibaba adopted Apache Flink as a unified, low‑latency, high‑throughput big‑data engine, detailing its stream‑first design, state management, checkpointing, massive production deployment, community contributions, and upcoming plans for a unified API, SQL layer, broader language support, and AI integration.
In the era of exploding data volumes, Alibaba needed a single engine that could handle both batch and streaming workloads without requiring developers to write separate code for each.
After evaluating many options, Alibaba chose Apache Flink because it is a stream‑first engine that can also simulate batch processing, offering low latency, high throughput, exactly‑once semantics, and strong state management.
Since 2016, a Flink‑based real‑time platform has been running on Alibaba’s Hadoop/YARN clusters, scaling from a few hundred to over ten thousand servers, managing petabytes of state, and processing trillions of events per day, even supporting peak loads of more than 4.7 × 10⁸ accesses per second during events like Double 11.
Alibaba contributed back to the open‑source community by redesigning Flink’s distributed architecture to decouple job scheduling from resource management (enabling native execution on YARN and Kubernetes), introducing an incremental checkpoint mechanism to keep checkpoint sizes bounded, and adding features such as credit‑based flow control and Streaming SQL.
Flink’s checkpointing relies on the classic Chandy‑Lamport algorithm, inserting barrier markers into the data flow to capture a consistent snapshot of state across the entire topology, ensuring fault‑tolerant, exactly‑once recovery.
Looking forward, Alibaba aims to evolve Flink into a truly unified batch‑stream engine with a DAG‑based API, a single SQL stack for both modes, broader language support (including Python and Go), and deeper integration with machine‑learning frameworks such as TensorFlow.
Qunar Tech Salon
Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.