Design and Implementation of a WASM Demuxer for WebCodecs Video Frame Extraction
The project extracts FFmpeg’s demuxing logic into a lightweight WebAssembly module that feeds container‑agnostic video packets to WebCodecs, enabling fast, low‑cost frame extraction across many formats and cutting cover‑generation latency by ~40% while reducing container‑related failures by ~72%.
Background: Bilibili’s web upload page requires video frame extraction for cover, category, and tag recommendation. Historically this was done with WebAssembly + FFmpeg. Since last year, WebCodecs has been introduced to improve performance, but it lacks demuxing capabilities, limiting support for formats such as FLV and AVI.
Problem: Existing JavaScript demuxers (mp4box.js, custom mkv‑demuxer) cover only MP4 and MKV. Adding support for each additional format incurs high development cost and low ROI, while high‑quality JS demux libraries are scarce.
Goal: Provide a low‑cost, generic demuxing solution for WebCodecs that supports as many video formats as possible.
Proposed Approach: Reuse FFmpeg’s extensive demuxing support via WebAssembly and combine it with the native decoding performance of WebCodecs. The short‑running demux step is handled by the WASM FFmpeg component, while the long‑running decode step is delegated to WebCodecs.
Core Idea: Extract the demux part from WebAssembly + FFmpeg into an independent WASM demuxer. The implementation steps are:
Add C functions to obtain the data required by WebCodecs decoders.
Write JS glue code (using Emscripten’s cwrap) for bidirectional communication between JS and C, passing demuxed data.
Adapt the frame‑extraction SDK to consume the raw data and feed it to WebCodecs.
Key Data Structures: Two trimmed FFmpeg structures are defined – WebAVStream (contains codec parameters, start time, duration, etc.) and WebAVPacket (contains key‑frame flag, timestamp, size, and data). Functions get_av_stream and get_av_packet locate the appropriate video stream and packet, convert them to the new structures, and return them to JavaScript.
Codec String Generation: WebCodecs’ VideoDecoder.configure requires a valid codec_string . The solution extracts codec configuration from AVStream/AVPacket, re‑uses FFmpeg’s internal logic (e.g., ff_isom_write_vpcc for VP9) to build the string, and verifies it against Chromium’s video_codec_string_parsers .
JS‑C Communication: C functions are wrapped with Module.cwrap to be callable from JS. After execution, the returned pointer is read via Module.getValue , assembled into a JavaScript object, and sent back through postMessage . The reverse direction (C invoking JS) follows the same pattern.
Integration into the Frame‑Extraction SDK: The WASM demuxer runs inside a Web Worker. Its postMessage interface is promisified, and the output is adapted to WebCodecs’ VideoDecoderConfig and EncodedVideoChunk formats.
Results: Deploying the WASM demuxer together with WebCodecs reduced the 90th‑percentile cover‑generation latency by ~40% and decreased the failure rate caused by unsupported containers by ~72%.
Additional Offering: An npm package named web-demuxer extracts the demuxer portion of WebAssembly + FFmpeg, resulting in a minimal gzip size of 115 KB (supporting MP4 and MKV). It enables video frame extraction with just a few lines of code and also provides a ReadableStream interface for more complex scenarios such as playback.
Conclusion: By modularizing FFmpeg’s demuxing capabilities and coupling them with native WebCodecs decoding, developers can achieve high‑performance, format‑agnostic video processing on the web. Future work includes advocating for native container support in WebCodecs and further expanding the format coverage.
Bilibili Tech
Provides introductions and tutorials on Bilibili-related technologies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.