Operations 10 min read

Design and Implementation of a CDN Edge‑Node Scheduling System for Bilibili Live Streaming

The paper presents Bilibili’s multi‑layer CDN edge‑node scheduling system, which groups heterogeneous nodes by quality and price, uses cost‑aware and resource‑aware heuristics—including maximum‑flow regional borrowing and contextual‑bandit utilization prediction—to allocate bandwidth per business, achieving a 43 % bandwidth reuse increase, 33 % coverage boost, and markedly lower stall rates.

Bilibili Tech
Bilibili Tech
Bilibili Tech
Design and Implementation of a CDN Edge‑Node Scheduling System for Bilibili Live Streaming

Background: With the rapid growth of Bilibili live‑streaming users, the bandwidth demand on CDN edge nodes has increased dramatically. The edge nodes are highly heterogeneous in capacity, price, and billing methods, making it challenging to achieve a dynamic balance between cost and quality while ensuring stability.

Scheduling System Design: The overall architecture follows a divide‑and‑conquer approach, decomposing the problem into multiple layers. Each layer addresses a core issue:

Cost Scheduling Layer – classifies CDN edge nodes into resource pools based on quality and bandwidth price, calculates available bandwidth for SLO‑strong and SLO‑weak resources using heuristic planning and business‑quality‑driven adjustments.

Resource Scheduling Layer – allocates the bandwidth computed by the cost layer to different business streams (e.g., live‑stream name, VOD ID).

Business Scheduling – fine‑grained policies that decide which edge node a user should access for a given stream or content.

Intelligent Scheduling Gateway – outputs the final node‑coverage list for each business.

The article focuses on the Cost Scheduling Layer and the Resource Scheduling Layer.

Cost Scheduling Layer – Resource Billing Methods: Bilibili CDN edge nodes use several billing models, including "Month‑95" (monthly 95th‑percentile peak), "Daily‑95 Monthly Average", and "Package Port" (fixed price within a bandwidth cap). The core idea is to group resources by quality and price, applying different planning strategies for SLO‑strong and SLO‑weak pools.

Strategy Model (Cost Layer):

Input:

supply – maximum capacity, billing method, region, and ISP of all CDN edge nodes.

demand – nationwide user bandwidth demand per region and ISP (business‑agnostic).

Output:

Maximum usable bandwidth for each edge node (including cost‑line and peak‑capacity resources).

Region‑ISP coverage of each edge node.

List of nodes available for peak‑capacity borrowing on a given day.

Processing Logic – Regional Borrowing: To address supply‑demand imbalance across regions, the problem is modeled as a maximum‑flow problem, borrowing surplus nodes from resource‑rich regions to deficit regions.

Cost Planning (SLO‑strong resources): The objective is to minimize total bandwidth cost while meeting a minimum utilization threshold. Nodes with higher unit prices are used preferentially after lower‑price nodes reach the utilization floor, and 95‑percentile nodes are fully utilized during their peak‑capacity windows.

Heuristic Resource Planning (SLO‑weak resources): The goal is to improve resource utilization while respecting user SLOs. It consists of two modules:

Integrated – collects multi‑round SLO and utilization feedback, applies a contextual bandit algorithm to predict global utilization targets for each resource pool.

Dispatching – adjusts node bandwidth up or down to align actual utilization with the predicted target, lowering nodes with poor SLOs and raising those with good SLOs.

Resource Scheduling Layer – Building on the cost layer, it quantitatively separates peak and non‑peak resources for each business. Objectives include:

Mutual exclusivity – most of a node’s resources are assigned to a single business.

Coverage – each business should have as many nodes as possible within each large region.

Strategy Model (Resource Layer):

Input:

demand – bandwidth demand of different businesses, regions, and ISPs during peak and off‑peak periods.

supply – output data from the cost scheduling layer.

Output:

Maximum bandwidth each CDN edge node can provide to each business (e.g., live, VOD, dynamic acceleration).

Processing Steps:

Split demand by business × region × ISP (currently only live streaming).

Sort supply nodes by available resources (currently only live‑stream pool).

Perform bin‑packing to allocate nodes to demand.

Live‑Streaming Deployment Results:

Bandwidth reuse rate increased by 43.5%.

Same‑region coverage improved by 32.61%.

After the new strategy, coverage in each region rose significantly (see comparative charts).

Heuristic planning reduced stall rates dramatically while maintaining SLO metrics.

Overall, the scheduling system achieves a more balanced nationwide supply‑demand state, with each region’s supply meeting or exceeding its demand.

live streamingCDNcost optimizationResource SchedulingBilibiliheuristic planningmax flow
Bilibili Tech
Written by

Bilibili Tech

Provides introductions and tutorials on Bilibili-related technologies.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.