Artificial Intelligence 19 min read

Algorithmic Empowerment of Bilibili Streaming: VOD Transcoding Decision, Resource Estimation, and Live Comment Semantic Analysis

The article details how Bilibili leverages AI algorithms—including XGBoost, statistical rules, XDeepFM, and fine‑tuned SBERT—to optimize VOD transcoding decisions, estimate compute resources and processing time, and analyze live comments, thereby boosting streaming efficiency, utilization, and user experience.

Bilibili Tech
Bilibili Tech
Bilibili Tech
Algorithmic Empowerment of Bilibili Streaming: VOD Transcoding Decision, Resource Estimation, and Live Comment Semantic Analysis

This article starts from Bilibili's streaming VOD business and demonstrates how various algorithms are applied to empower the service.

Background Overview : Since the 20th century, AI research has evolved from simple algorithms to sophisticated methods such as decision trees, SVMs, neural networks, deep learning, and reinforcement learning. The rise of big data and hardware advances have enabled AI to be widely used in image, speech, and natural language processing.

In the VOD streaming domain, data are high‑dimensional, nonlinear, multimodal, and temporal. Business problems like predicting popular videos or recognizing patterns in live comments cannot be solved reliably with handcrafted rules, prompting the adoption of algorithmic solutions.

Business Example 1 – VOD Optimization Transcoding Decision : While H.264 is the baseline codec, newer codecs such as H.265 and AV1 (referred to as "optimized transcoding") offer significantly higher compression efficiency, saving storage and bandwidth. Because compute resources are limited, the system must select a subset of "head videos"—the top videos that account for 90% of total playback—to be transcoded with the optimized codecs.

The decision is modeled with a matrix of three sub‑models: Model1 (pre‑release, using uploader and video metadata), Model2 (real‑time decision within the first 24 hours based on statistical trends), and Model3 (post‑24 hours, incorporating one‑day playback data). Model1 and Model3 employ XGBoost tree models, while Model2 uses a statistical rule‑based approach.

Since its launch in 2021, the model pipeline processes about 10% of daily new videos, covering roughly 90% of total playback and achieving a head‑video recall of ~75%.

Business Example 2 – Transcoding Resource Estimation : To balance limited compute capacity, a resource‑quantization model is introduced, consisting of (1) a per‑minute prediction of available cores and (2) a task‑level resource estimation.

The available‑resource prediction is inspired by stock‑trading concepts and is expressed as:

valid_core = core * c *(1 - vol + vol_rate) if usage < threshold else 0

where core is the remaining CPU cores, c is a normalized coefficient, vol captures recent utilization variance, and vol_rate reflects the short‑term trend of core availability.

Task‑level estimation leverages the strong linear relationship between resources required for different codecs (e.g., H.265 vs. AV1) to convert resource needs from one codec to another.

Deploying this model smooths the resource utilization curve, raising average overnight utilization by 13% and increasing the number of dispatched tasks by 25%.

Business Example 3 – Transcoding Time Estimation : An XDeepFM model (combining linear, compressed interaction, and DNN modules) is trained on 54 features extracted from video and transcoding metadata, using millions of samples. The model achieves a mean absolute error of less than 5 minutes, reducing error by 70% compared with rule‑based estimates.

Business Example 4 – Live Comment Semantic Analysis : To detect real‑time user feedback about stream quality issues, a SBERT (Sentence‑BERT) model is fine‑tuned on hundreds of thousands of labeled comment pairs. After fine‑tuning, the model’s accuracy rises from ~60% to over 95%. The service architecture processes live comments, computes sentence embeddings, and triggers alerts when a high similarity to predefined “issue” sentences is detected.

Summary and Outlook : The four case studies—VOD transcoding decision, resource estimation, transcoding time prediction, and live‑comment analysis—showcase how algorithmic techniques improve streaming efficiency and user experience. Ongoing work includes building a unified model training and testing framework to accelerate iteration and reuse across diverse business problems.

machine learningAIXGBoostResource EstimationsbertStreaming AnalyticsTranscoding Optimization
Bilibili Tech
Written by

Bilibili Tech

Provides introductions and tutorials on Bilibili-related technologies.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.