Two‑Pass Deep Learning Bitrate Factor Prediction for Constant‑Quality Segment Encoding in Bilibili Narrow‑Band HD Transcoding
Bilibili’s IEEE‑VCIP‑2022‑accepted two‑pass deep‑learning bitrate‑factor predictor achieves 98.8% accuracy with only 1.55 encoding passes, enabling constant‑quality segment encoding that reduces bitrate consumption by over 15% while maintaining visual quality in its narrow‑band HD transcoding pipeline.
The key algorithm of Bilibili's narrow‑band HD transcoding system—based on a two‑pass deep‑learning bitrate‑factor prediction for constant‑quality segment encoding—was accepted by the IEEE International Conference VCIP 2022.
This method can control the quality of encoded video with 98.8% accuracy, saving Bilibili more than 15% of bitrate cost.
Bilibili receives hundreds of thousands of video submissions daily. Popular videos attract most user attention and consume the majority of bandwidth resources. For these hot videos, Bilibili applies a narrow‑band HD transcoding system that uses higher‑complexity encoding while keeping visual quality unchanged, thereby eliminating data redundancy and avoiding waste of bandwidth. To improve transcoding performance, Bilibili adopts a constant‑quality segment‑encoding strategy.
Compared with average‑bitrate encoding, constant‑quality encoding guarantees stable visual quality and prevents bitrate waste.
The main difficulty of constant‑quality encoding lies in accurately predicting encoding parameters. The best existing academic solution, proposed by Xing et al., uses a convolutional neural network and achieves only about 77.6% prediction accuracy, far below the precision required for a practical, stable transcoding system.
Inspired by the two‑pass bitrate‑control strategy commonly used in encoding algorithms, we innovatively propose a two‑pass parameter‑prediction method that reaches 98.8% prediction accuracy with an average of only 1.55 encoding passes.
In the first pass, traditional image features and ultra‑fast pre‑encoding features are extracted to describe video characteristics. A lightweight neural network predicts encoding parameters, and the video is encoded and quality‑checked. Because the feature set is lightweight, the first‑pass prediction is fast; about 45.3% of video segments already meet the quality target after this pass.
For segments that do not meet the quality requirement, a second pass is performed. The second pass reuses the first‑pass video features and adds the first‑pass predicted parameters and the actual quality scores as feedback features into the neural network. This feedback provides a stable anchor, boosting the second‑pass prediction accuracy to 98.8%. Since the first‑pass already succeeds for many segments, the overall average number of encoding passes is only about 1.55.
The deep neural network used for parameter prediction is constructed as shown in the diagram above and is trained on 500,000 video samples.
After training, the model is evaluated on 100,000 test samples. The results demonstrate very high prediction accuracy with virtually no cases of quality loss. Note that the experimental results use the open‑source encoder x265 with a fixed quality target for illustration; Bilibili’s production system employs a proprietary encoder and quality standards that vary by business scenario and video format. When applied to the narrow‑band HD transcoding system, the method helped Bilibili save more than 15% of bitrate cost.
The pre‑print paper can be obtained via the QR code below, and the arXiv link is https://arxiv.org/abs/2208.10739.
Below is the presentation video.
Bilibili Tech
Provides introductions and tutorials on Bilibili-related technologies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.