Cloud Native 19 min read

Bilibili's Vertical Pod Autoscaler (VPA) Practice and Cluster Resource Governance

Bilibili extended Kubernetes with a custom in‑place Vertical Pod Autoscaler framework—including generator, recommender, updater, and webhook controllers plus a management platform for strategy tuning, avoidance, analysis, and anomaly detection—reducing over‑provisioned resources across its ten‑thousand‑node private cloud and achieving up to 60 % CPU and 30 % memory savings.

Bilibili Tech

Feb 14, 2023

Bilibili's Vertical Pod Autoscaler (VPA) Practice and Cluster Resource Governance

Background

Bilibili's private cloud platform built on Kubernetes has reached a scale of ten thousand nodes, hosting most online services as well as offline workloads such as machine learning, big data, and transcoding. During cost‑reduction efforts, it was observed that many workloads request container resources far larger than their actual load, leading to high node CPU allocation rates that prevent scheduling new containers, while the actual CPU peak utilization remains low.

Industry Status

Vertical Pod Autoscaler (VPA) typically provides three capabilities: (1) recommending container resource specifications based on real workload; (2) adjusting newly created Pods via a Kubernetes webhook; (3) dynamically updating resources of existing Pods. Open‑source VPA only supports the first two; in‑place adjustment of existing Pods requires custom modifications to the Kubernetes source code. Bilibili chose to modify the K8s code to enable in‑place VPA and built a custom VPA control framework.

VPA Principles

VPA consists of several custom resources (CRDs) and core components:

VPA Generator object – a template that matches Pods by label selector and creates per‑application VPA objects.

VPA object – stores matching information, recommendation algorithm parameters, current recommendation, and update mode.

Core components include:

VPA Generator controller – creates and cleans up VPA objects based on service level.

VPA Recommender controller – fetches historical load from Prometheus, runs recommendation algorithms, and writes results to the VPA object's status.

VPA Updater controller – applies recommended resources to existing Pods according to the selected update mode.

VPA Webhook service – injects recommendations into newly created Pods.

VPA‑API service – mediates between the management platform and the Kubernetes API server.

Supporting services such as kube‑apiserver, cAdvisor, and Prometheus provide metrics and API access.

VPA Generator Controller

The controller watches VPA Generator objects, matches Pods by selector, groups them by application, and creates corresponding VPA objects. When Pods are terminated, their VPA objects are retained for a grace period before automatic deletion. This reduces operational overhead by managing VPA at the service‑level rather than per‑application.

VPA Updater Controller

The Updater supports four update modes:

Pre‑run mode – only calculates recommendations without applying them.

Initialization mode – the webhook updates resources of newly created Pods (used in public clouds without in‑place VPA).

Direct update mode – the Updater modifies resources of existing Pods (available in private clouds with in‑place VPA).

Automatic mode – combines initialization and direct update, handling both new and existing Pods.

VPA Recommender Controller

It queries Prometheus for historical metrics, calculates recommended resources, and writes them to the VPA object's status. Recommendations are bounded by configurable minimum (e.g., 0.1 CPU or 0.1 Gi) and maximum (limit minus reserved resources).

Two main recommendation scenarios are covered:

Automated adjustment of resource requests to improve cluster utilization. The formula uses a target saturation factor (e.g., 40 % for high‑priority services, 70 % for lower‑priority services) applied to the 95th percentile of the last seven days' usage.

Automatic increase of memory limits when OOM events are detected, using recent memory usage, an OOM factor, and OOM count to compute a safe new limit.

VPA Management Platform

The platform provides four major functions:

VPA strategy tuning – managing resource metrics, templates, and performing pre‑run vs. production comparisons.

VPA strategy avoidance – temporarily disabling VPA for specific applications during stress tests or events.

VPA effect analysis – tracking coverage, resource savings, and adjustment records.

VPA anomaly detection – monitoring controller logs, event frequencies, and key metrics to alert on failures or mis‑configurations.

Strategy Management

Key aspects include defining resource metrics (e.g., 7‑day 95th‑percentile CPU usage), creating VPA Generator templates, and using pre‑run mode for safe evaluation before production rollout.

Strategy Avoidance

During high‑traffic events, VPA can be temporarily disabled for selected services to prevent over‑provisioning based on historical peaks.

Effect Analysis

Metrics such as number of covered applications, total CPU/Memory saved, and saturation trends demonstrate that VPA reduces idle resources while maintaining performance.

Anomaly Detection

Automated log analysis, event rate monitoring, and custom inspection rules help identify issues like failed adjustments or mismatched QoS levels.

Cluster Resource Governance

Continuous optimization includes handling cases where VPA cannot act (e.g., guaranteed QoS Pods), refining historical load windows (switching from 1‑day 99th percentile to 7‑day 95th percentile), and setting different expected saturation rates for high‑availability versus lower‑priority services.

Benefits

Applying VPA freed approximately 60 % of CPU and 30 % of memory in test clusters, and about 30 % of CPU in production, saving the cost of thousands of machines.

Conclusion and Outlook

The article summarizes Bilibili's VPA practice, covering core controllers, management platform features, and operational results. Future work will integrate application profiling and compute‑standardization to further refine VPA strategies for cost reduction and efficiency.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Kubernetes SRE vertical pod autoscaler

Written by

Bilibili Tech

Provides introductions and tutorials on Bilibili-related technologies.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.