Walle: An End-to-End, General-Purpose, Large-Scale Device-Cloud Collaborative Machine Learning System
Walle is Alibaba’s first end‑to‑end, general‑purpose, large‑scale device‑cloud collaborative machine‑learning platform that manages billions of mobile devices, provides a full‑stack data and compute pipeline, cuts cloud load by 87 %, reduces latency to ~100 ms, and already powers over a trillion daily ML invocations across dozens of Alibaba apps.
The Walle system (named after WALL‑E) is Alibaba’s first end‑to‑end, general‑purpose, large‑scale device‑cloud collaborative machine‑learning platform, selected for the USENIX OSDI 2022 conference.
It aims to overcome latency, cost, load and privacy issues of traditional cloud‑only ML frameworks by exploiting billions of mobile devices for data and computation. Unlike prior works that focus on algorithms, Walle provides a full stack that allows any stage of an ML task (pre‑processing, training, inference, post‑processing) to exchange data between device and cloud.
The system consists of three core modules: (1) a deployment platform that manages millions of tasks and pushes updates to billions of devices; (2) a data pipeline that performs stateful stream processing on resource‑constrained devices and provides low‑latency data transfer to the cloud; (3) a compute container built on the MNN deep‑learning framework, featuring a geometry‑based operator decomposition, semi‑automatic backend search, and a Python thread‑level VM that removes the GIL for multi‑task parallelism.
Benchmarks show that on Android, iOS and Linux servers MNN outperforms TensorFlow Lite and PyTorch Mobile on seven representative models, and that Walle reduces cloud load by 87 % and improves coverage in Taobao live‑streaming smart‑spot detection by 123 % while cutting per‑spot latency to ~100 ms on modern phones.
In the e‑commerce recommendation pipeline, Walle’s on‑device data pipeline generates item‑page‑view features in 44 ms (vs. 34 s in the cloud), cutting data volume by >90 % and maintaining >99.3 % accuracy.
The platform already serves over a trillion ML invocations per day across more than 30 Alibaba apps and 300+ tasks. The underlying MNN engine is open‑source (GitHub ★6.8k stars, 1.4k forks) and has been adopted by dozens of companies.
The paper “Walle: An End‑to‑End, General‑Purpose, and Large‑Scale Production System for Device‑Cloud Collaborative Machine Learning” was presented at OSDI 2022 (pages 249‑265). Authors include Chengfei Lv, Chaoyue Niu, Renjie Gu, et al.
DaTaobao Tech
Official account of DaTaobao Technology
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.