Key Takeaways from Google’s Borg Paper: Resource Management and Scheduling Insights
The article reviews Google’s Borg paper, highlighting how Borg distinguishes production and non‑production tasks, manages jobs and containers, improves utilization through mixed‑workload scheduling, enforces isolation via overload and over‑commitment controls, and compares Borg’s approach to other cluster managers.
The post summarizes the recently published Google Borg paper, noting its importance for anyone interested in resource management and scheduling, as Google openly shares extensive data and experience about its internal production‑grade cluster manager.
Borg treats everything that runs in production as a "task" and decides on which machine each task executes, making it the central authority for managing all production resources and achieving significant cost savings.
The paper introduces a clear distinction between Prod (production) and Non‑Prod tasks, explaining how Borg separates online services like Gmail from offline batch jobs, and how this classification drives scheduling decisions.
Each Job can specify constraints such as OS version or CPU architecture, and jobs are prioritized into levels (monitoring, production, batch, best‑effort). Tasks correspond to containers on a single machine and include detailed resource specifications, even down to TCP ports.
A major focus is utilization: Borg mixes offline and online workloads on the same cluster, avoiding the need for separate clusters and reducing machine count by 20‑30 %. It achieves this by allowing non‑prod tasks to use idle resources left by prod tasks during short peak periods.
For isolation, Borg controls overload by classifying tasks as latency‑sensitive (LS) or batch, and manages over‑commitment by distinguishing compressible resources (CPU, I/O) from non‑compressible ones (memory, disk). When non‑compressible resources run low, Borglet immediately kills lower‑priority tasks; for compressible shortages, it throttles batch CPU usage before the master evicts tasks.
Borglet also runs a user‑space program to enforce memory limits per task, and the kernel’s OOM handler kills tasks based on priority when memory is exhausted.
CPU scheduling is customized: LS tasks may occupy whole cores, while batch tasks run on any core at lower priority. Borg modifies the kernel scheduler to dynamically kill batch tasks based on container load, while also considering thread and NUMA affinity.
The article concludes that Borg’s extensive resource‑allocation and isolation mechanisms enable effective mixed‑workload operation, a capability not commonly found in other systems such as Mesos, YARN, or Alibaba’s Fuxi, which tend to manage specific resource pools rather than the entire data‑center.
It also mentions comparable efforts at Tencent and Baidu (Matrix), noting that while Baidu’s implementation is lighter weight, Borg’s comprehensive approach remains a benchmark for large‑scale cluster management.
Qunar Tech Salon
Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.