Cloud Native 17 min read

Baidu Cloud‑Native Mixed Workload (Offline Co‑location) Technology Overview

Baidu’s mixed‑workload approach co‑locates offline batch jobs with latency‑sensitive online services on shared nodes, using a dynamic resource view, priority‑based scheduling, cpuset/NUMA isolation, eBPF policies, and predictive profiling, boosting CPU utilization above 40 % and saving billions of RMB in total cost of ownership.

Baidu Geek Talk

Jan 5, 2022

Baidu Cloud‑Native Mixed Workload (Offline Co‑location) Technology Overview

Baidu faces low server resource utilization and rising total cost of ownership (TCO). To address this, it adopts mixed‑workload ("混部") technology, which combines online and offline services on the same physical machines.

Offline mixed‑workload (offline co‑location) separates applications into online (latency‑sensitive, long‑running) and offline (batch, non‑latency‑sensitive) categories. Online services such as search have clear diurnal load patterns, while offline jobs (big‑data computation, machine learning) can run at any time without affecting users.

Online clusters typically have CPU utilization around 20 % because resources are over‑provisioned for peak demand ("tidal" effect) and redundant replicas are kept for disaster recovery. Offline clusters are often separate, leading to an imbalance where online resources are under‑utilized and offline resources are over‑provisioned.

Baidu’s solution places offline jobs into the online resource pool, dramatically improving overall utilization. The Baidu Container Engine (CCE) now supports offline mixed‑workload, achieving >40 % CPU utilization and saving billions of RMB.

In Kubernetes (K8s), static resource allocation leads to a large gap between requested and actual usage. Baidu introduces a dynamic resource view for offline workloads: offline tasks see the total node capacity minus the resources already consumed by online pods, allowing them to reuse idle resources.

The scheduler classifies priorities into high, medium (online) and low (offline). Offline tasks run as BestEffort pods, which do not consume requests and therefore can be scheduled onto nodes that appear full from the online perspective.

CPU isolation is achieved via cpuset binding and NUMA‑aware placement, keeping latency‑critical online pods on the same NUMA node and reducing cross‑node latency. An offline‑specific scheduler runs after the online scheduler, ensuring online pods always have precedence.

Memory isolation includes a background page‑cache reclamation mechanism that preferentially recycles offline cache, preventing offline cache pressure from evicting online cache.

For fine‑grained isolation, Baidu leverages eBPF to inject custom policies at runtime without kernel restarts, enabling per‑service resource guarantees beyond the generic K8s QoS classes.

The high‑performance offline scheduler can handle up to 5 k operations per second (ops) while throttling binding to 1.5 k ops to protect etcd.

Resource profiling predicts future online usage (e.g., one‑hour window) and schedules offline jobs only when sufficient capacity is forecasted, avoiding interference and improving both online availability and offline throughput.

Future directions include expanding mixed‑workload scale, tighter resource profiling, kernel‑programmable techniques (eBPF), support for heterogeneous resources such as GPUs, container‑VM fusion, and multi‑cloud elastic mixing.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

cloud-native Kubernetes eBPF resource scheduling Mixed Workload

Written by

Baidu Geek Talk

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.