How Alibaba Cuts Costs by 30% with Co‑Location Scheduling (Mix‑Deploy)
This article explains Alibaba's co‑location (混部) technology that mixes online services and batch compute on the same physical servers, detailing its background, key characteristics, scheduling architecture, resource isolation mechanisms, cost‑saving formulas, and future roadmap, showing how it boosts utilization and reduces expenses.
Background Overview
Alibaba faces massive cost pressure during peak events like Double‑11, requiring huge compute capacity that sits idle most of the time. Global server CPU utilization is only 6‑12%, and even with virtualization it rarely exceeds 17%. Alibaba's online services average about 10% CPU usage, while batch compute workloads often run at 50‑70%.
What Is Co‑Location (混部)?
Co‑location, or mixed‑deployment, means placing different types of tasks—online services and batch compute—on the same physical resources. By using sophisticated scheduling, resource isolation, and priority controls, the system maintains Service Level Objectives (SLOs) while dramatically improving resource utilization and cutting costs.
Key Characteristics
Priority Separation : Low‑priority batch jobs behave like sand or water, yielding to high‑priority online services when needed.
Resource Complementarity : Online services peak during the day, while batch jobs peak at night, allowing time‑based sharing.
Cost‑Saving Model
Assuming a data center with
Nservers, improving average utilization from
R1to
R2saves
X = N*(R2‑R1)/R2servers. For 100,000 servers, raising utilization from 28% to 40% saves 30,000 machines, equating to roughly ¥6 billion in hardware cost.
Historical Milestones
2014: Technical feasibility studies and design.
2015: Early testing uncovered scheduling, resource contention, and memory issues.
2016: Small‑scale production with ~200 nodes, focusing on fault tolerance.
2017: Full‑scale deployment; about 20% of Double‑11 traffic ran on mixed clusters.
Scheduling Architecture
Two independent schedulers run side‑by‑side: Sigma manages online service containers (Kubernetes‑compatible API, OCI‑compatible Pouch containers) and Fuxi handles massive data‑processing jobs (MapReduce‑like pipelines, supporting >100k parallel tasks). A “zero‑layer” coordination component mediates resource allocation between them.
Resource Isolation Techniques
CPU Scheduling : CGroup‑based priority shares, CPU‑time‑slice pre‑emptions, and hyper‑thread noise cleaning.
L3 Cache Isolation : Intel CAT to limit low‑priority cache usage.
Memory Bandwidth & Reclamation : Real‑time monitoring, CFS bandwidth control, and priority‑aware memory reclamation to protect online services.
IO & Network Controls : blkio limits for IOPS/BPS, metadata throttling, and hierarchical bandwidth sharing (gold/silver/bronze tiers).
Future Directions
Alibaba plans to extend mixed‑deployment to real‑time compute, GPUs, and FPGAs, scale clusters to millions of cores, enhance resource‑profile prediction with deep learning, and evolve scheduling toward a unified priority model that treats all workloads uniformly.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.