Artificial Intelligence 11 min read

Walle: An End-to-End, General-Purpose, Large-Scale Device-Cloud Collaborative Machine Learning System

Walle is Alibaba’s first end‑to‑end, general‑purpose, large‑scale device‑cloud collaborative machine‑learning platform that manages billions of mobile devices, provides a full‑stack data and compute pipeline, cuts cloud load by 87 %, reduces latency to ~100 ms, and already powers over a trillion daily ML invocations across dozens of Alibaba apps.

DaTaobao Tech

Jul 18, 2022

Walle: An End-to-End, General-Purpose, Large-Scale Device-Cloud Collaborative Machine Learning System

The Walle system (named after WALL‑E) is Alibaba’s first end‑to‑end, general‑purpose, large‑scale device‑cloud collaborative machine‑learning platform, selected for the USENIX OSDI 2022 conference.

It aims to overcome latency, cost, load and privacy issues of traditional cloud‑only ML frameworks by exploiting billions of mobile devices for data and computation. Unlike prior works that focus on algorithms, Walle provides a full stack that allows any stage of an ML task (pre‑processing, training, inference, post‑processing) to exchange data between device and cloud.

The system consists of three core modules: (1) a deployment platform that manages millions of tasks and pushes updates to billions of devices; (2) a data pipeline that performs stateful stream processing on resource‑constrained devices and provides low‑latency data transfer to the cloud; (3) a compute container built on the MNN deep‑learning framework, featuring a geometry‑based operator decomposition, semi‑automatic backend search, and a Python thread‑level VM that removes the GIL for multi‑task parallelism.

Benchmarks show that on Android, iOS and Linux servers MNN outperforms TensorFlow Lite and PyTorch Mobile on seven representative models, and that Walle reduces cloud load by 87 % and improves coverage in Taobao live‑streaming smart‑spot detection by 123 % while cutting per‑spot latency to ~100 ms on modern phones.

In the e‑commerce recommendation pipeline, Walle’s on‑device data pipeline generates item‑page‑view features in 44 ms (vs. 34 s in the cloud), cutting data volume by >90 % and maintaining >99.3 % accuracy.

The platform already serves over a trillion ML invocations per day across more than 30 Alibaba apps and 300+ tasks. The underlying MNN engine is open‑source (GitHub ★6.8k stars, 1.4k forks) and has been adopted by dozens of companies.

The paper “Walle: An End‑to‑End, General‑Purpose, and Large‑Scale Production System for Device‑Cloud Collaborative Machine Learning” was presented at OSDI 2022 (pages 249‑265). Authors include Chengfei Lv, Chaoyue Niu, Renjie Gu, et al.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Machine Learning MNN large-scale systems device-cloud collaboration OSDI

Written by

DaTaobao Tech

Official account of DaTaobao Technology

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.