Design and Architecture of a Self‑Developed Video Transcoding Core
The team built a custom video‑transcoding core atop FFmpeg libraries, replacing the command‑line tool with modular controllers, pipelines, and parallel tasks that dynamically adapt resolution, frame‑rate, and SEI handling for both low‑latency live streams and high‑throughput VOD, improving scalability and maintainability.
Video transcoding converts an uploaded video file into multiple resolution variants by demuxing, decoding, filtering, encoding and remuxing. Bilibili processes massive daily uploads, producing lower‑bitrate streams that improve playback smoothness, reduce bandwidth consumption and standardize codec specifications.
The most widely used server‑side transcoding framework is FFmpeg, which provides low‑level libraries for demux/mux, codec, filters and a command‑line tool. However, the native ffmpeg command line shows several limitations for large‑scale, real‑time scenarios:
Serial pipelines before FFmpeg 7.0 cannot fully exploit multi‑core CPUs, causing bottlenecks in multi‑resolution VOD and live streams.
Live streaming requires dynamic parameter updates, which the static command line cannot handle.
The control logic is scattered in a few .c files, making maintenance and module separation difficult.
Upgrading FFmpeg versions often leads to painful code migrations.
To overcome these issues, a self‑developed transcoding core was built on top of the FFmpeg libraries, replacing the command‑line tool.
Core Architecture
The core abstracts the FFmpeg primitives into modular components. A Controller module orchestrates frame scheduling for both VOD and live streams. For VOD, the controller simply maps input streams to output streams. For live streaming, it handles timed frame pulling, message interaction, and dynamic changes of inputs, outputs or filters without restarting containers.
Each transcoding pipeline (Pipeline) corresponds to one output variant. Inside a pipeline, a Flow processes a single audio‑video stream, and each processing step (filter, encoder, sampler, muxer) is represented as a Task . Tasks inherit from a common PipelineWorker base class and can run either serially or in parallel; parallel workers spawn dedicated threads for frame handling.
Live vs. VOD Transcoding
Live transcoding demands low latency; serial pipelines cause blocking when multiple pipelines compete for CPU resources. By enabling parallel mode, tasks drop frames when internal queues exceed thresholds, preserving stream stability.
VOD transcoding focuses on throughput. Depending on a task’s internal parallelism, serial or parallel execution is chosen to maximize CPU utilization while avoiding unnecessary thread‑switch overhead.
Dynamic Adaptive Transcoding
Live streams using FLV/RTMP may change resolution or frame rate on the fly. The core supports dynamic adaptation:
Resolution adaptation : When a resolution change is detected, the scale filter parameters are recomputed using FFmpeg expression syntax, preserving aspect ratio via a zoom‑style scaling.
Frame‑rate adaptation : Instead of fixed‑frame‑rate (CFR) sampling, the core employs variable‑frame‑rate (VFR) and a VFR‑HALF mode that halves the output frame rate when the source exceeds the target, ensuring uniform sampling and reducing jitter.
SEI (Supplemental Enhancement Information) Management
SEI carries auxiliary data such as subtitles, game scores or HDR color information. Older ffmpeg versions discarded SEI during decoding; newer versions store it in the frame structure but rely on the encoder for writing. The self‑developed core inserts a BSF filter after encoding to uniformly write SEI for AVC, HEVC and AV1 streams, and can optionally drop or merge SEI from discarded frames based on configuration.
In live streaming, SEI is also used to trace the full lifecycle of a stream, enabling precise latency and quality analysis on the client side.
Summary and Outlook
The custom transcoding core was initially deployed for live director‑board workflows in 2020 and has since expanded to live and VOD streaming. Future work includes deeper AI integration (e.g., live subtitles, game scoreboard overlays) and finer‑grained pipeline parallelism to push resource utilization to its limits.
Bilibili Tech
Provides introductions and tutorials on Bilibili-related technologies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.