Cloud Native 17 min read

Mixed Workload Co-location Practices in Bilibili's Kubernetes Cloud Platform

Bilibili’s Kubernetes cloud platform boosts server utilization by co‑locating latency‑sensitive online services with batch‑oriented offline jobs on the same nodes, using custom schedulers, extended resources, dynamic CPU/memory isolation, and a management console, achieving average CPU usage around 35 % and significant cost savings.

Bilibili Tech
Bilibili Tech
Bilibili Tech
Mixed Workload Co-location Practices in Bilibili's Kubernetes Cloud Platform

Author : Xu Long, Senior Development Engineer at Bilibili, responsible for Kubernetes cloud platform development since 2020.

Background : Large‑scale internet companies operate tens of thousands of servers. Under cost‑reduction pressure, improving machine resource utilization while preserving service SLOs is critical. Two main problems cause low utilization in a Kubernetes cloud platform: (1) business requests over‑provision resources; (2) services exhibit peak‑valley load patterns.

To address (1), a service profile recommends reasonable resource quotas combined with elastic scaling. To address (2), idle resources during off‑peak periods (e.g., overnight) are used to run offline tasks.

Bilibili’s private cloud has reached a large scale, and the team applies two strategies: (a) Horizontal Pod Autoscaler (HPA) and Vertical Pod Autoscaler (VPA) for elastic scaling; (b) large‑scale mixed‑workload (co‑location) to reuse idle compute. This article focuses on the co‑location practice.

Concept of Co-location : Online services (latency‑sensitive, high‑availability) and offline services (batch, latency‑insensitive) are scheduled onto the same physical machines with isolation mechanisms to guarantee SLOs while increasing utilization.

Co-location Scenarios at Bilibili :

Offline‑online co‑location: Video transcoding tasks (offline, compute‑intensive) are scheduled onto online clusters during night‑time, filling idle capacity.

Offline‑offline co‑location: Idle resources in offline clusters (e.g., training nodes) are used for other offline jobs, including big‑data tasks that require Yarn integration.

Idle‑machine co‑location: Reserved backup machines are automatically enrolled in Kubernetes for mixed tasks and withdrawn when needed.

Overall Architecture (Figure omitted) includes:

Task submission modules: caster for online services, crm for offline batch jobs.

Kubernetes scheduling modules: native kube‑scheduler for online pods, custom job‑scheduler for offline co‑location tasks, and a webhook that converts pod resource requests to custom extended resources (e.g., caster.io/colocation‑cpu ).

Colocation‑agent on each node: reports available co‑location resources, enforces isolation, and reports metrics.

Colocation config manager: centrally manages policies, switches, and per‑node configurations.

Observability: agents export metrics (available co‑location resources, actual usage) to Prometheus for dashboards.

Co-location Task Scheduling :

Kubernetes native scheduler allocates resources based on static requests, which leads to two issues: (1) co‑location pods consume native resource quotas, starving online pods; (2) static scheduling ignores real‑time load, missing idle nodes.

The solution uses extended resources. The colocation‑agent calculates available co‑location capacity (dynamic, static, or time‑based strategies) and reports it via a device‑plugin. Offline pods are labeled caster.io/resource‑type: colocation ; a webhook rewrites their requests to the extended resource. The custom job‑scheduler then matches pod requests with node‑reported extended resources. For homogeneous offline jobs (e.g., transcoding), the scheduler optimizes by hashing pod fields, caching pre‑selection results, and reusing cached nodes for similar pods.

Online QoS Assurance :

Task Scheduling : Global view of co‑location capacity ensures tasks are placed on nodes with sufficient idle resources.

Resource Isolation : The colocation‑agent adjusts cgroup limits (cpu‑share, cpu‑quota, cpuset) in real time based on online load.

Task Eviction : Offline tasks are retryable; the agent can evict them when resource pressure exceeds thresholds, with a cool‑down period to avoid frequent rescheduling.

Co-location “Big Frame” : After webhook conversion, pods run as BestEffort and are placed in the /sys/fs/cgroup/cpu/kubepods/besteffort hierarchy, forming a “big frame” that can be throttled collectively.

Dynamic CPU Isolation : The agent sets minimal cpu‑share for the big frame, adjusts cpu‑quota according to online load, and binds CPUs (including hyper‑threading considerations) to minimize interference.

Dynamic Memory Isolation : Similar to CPU, memory quota is adjusted, and oom_score_adj is set high so that co‑location pods are evicted first under memory pressure.

Network Bandwidth Limiting : A CNI adaptor assigns bridge‑mode IPs to offline pods and uses Linux tc to cap bandwidth.

Co-location Management Platform provides:

Policy management: view and set node‑level co‑location policies (safety watermarks, hard limits) and group‑based controls.

Enable/disable co‑location per node with one‑click eviction.

Monitoring: per‑node and per‑group dashboards for resource reports, task counts, and usage trends.

Results : Most Bilibili machines now participate in co‑location, achieving average CPU utilization of ~35% (peak ~55%). This supports large‑scale video transcoding, AI moderation, and big‑data MapReduce tasks, saving thousands of servers.

Conclusion : The article presents Bilibili’s non‑intrusive co‑location framework on Kubernetes, covering offline‑online, offline‑offline, and idle‑machine scenarios, with scheduling, isolation, observability, and management components. Future work includes kernel‑level isolation, unified scheduling, and further cost‑reduction optimizations.

Cloud NativeobservabilityKubernetesResource Schedulingmixed workloadsco-location
Bilibili Tech
Written by

Bilibili Tech

Provides introductions and tutorials on Bilibili-related technologies.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.