Frontend Development 17 min read

Design and Implementation of a High‑Performance Matroska Demuxer for Web Uploads

The new mkv-demuxer SDK replaces the slow FFmpeg-Wasm solution on Bilibili’s upload page by reading Matroska files in slice-sized ArrayBuffers, parsing EBML headers and SeekHead indexes, and exposing getMeta, getData, and seekFrame APIs, cutting memory use by 98 % and parsing time by 97 % while accelerating cover-generation and recommendation processing.

Bilibili Tech
Bilibili Tech
Bilibili Tech
Design and Implementation of a High‑Performance Matroska Demuxer for Web Uploads

Matroska is an open, flexible multimedia container format that can hold multiple video, audio, and subtitle streams. On Bilibili's web upload page, Matroska videos account for over 2% of uploads, making efficient parsing essential.

The original solution used FFmpeg compiled to WebAssembly via Emscripten, which supports many formats but suffers from slow parsing speed, high memory consumption, and lack of hardware acceleration.

For MP4, the upload page has already switched to mp4box for demuxing and WebCodecs for decoding, achieving a 70% efficiency gain. To similarly improve Matroska handling, a new demuxing approach based on WebCodecs is required.

Technical research shows that Matroska is built on EBML, consisting of an EBML Header and a Segment containing up to eight top‑level elements such as SeekHead, Info, Tracks, Cues, and Cluster. The SeekHead provides indexes for fast random access.

Existing open‑source projects include jswebm, which can parse WebM (a Matroska‑based format) but loads the entire file into memory, leading to high memory usage and lack of APIs for selective metadata or frame extraction.

Therefore a new SDK, mkv-demuxer , was created. Its design focuses on:

Reading files by reference and fetching only needed ArrayBuffer slices, avoiding full file loading.

Parsing EBML Header, then Segment, prioritizing SeekHead to record positions of top elements.

Providing APIs to obtain video metadata (getMeta), all packet data (getData), and seek to a specific frame (seekFrame).

Example usage:

import MkvDemuxer from 'mkv-demuxer'
const demuxer = new MkvDemuxer()
const filePieceSize = 1 * 1024 * 1024
await demuxer.initFile(file, filePieceSize)
const meta = await demuxer.getMeta()
const data = await demuxer.getData()
const frame = await demuxer.seekFrame(10)

The getMeta API returns an object containing container info, video track codec, resolution, etc.; getData returns arrays of video and audio packets with timestamps; seekFrame returns the nearest keyframe packet for a given timestamp.

Performance tests on a 4K VP9 video (1.61 GB) show that the new SDK reduces memory usage by 98.34% and parsing time by 97.21% compared with the FFmpeg+Wasm solution.

In the web upload workflow, faster metadata extraction and frame sampling improve AI‑driven cover and category recommendations, shortening total processing time by up to 21% for high‑resolution videos.

Future work includes extending mkv-demuxer to parse Matroska tags, attachments, and EBML streams, and integrating the SDK into the edge‑transcoding pipeline to provide early bitrate and size calculations.

References: EBML RFC 8794, Matroska specifications, WebM project, and related npm packages (mp4box, jswebm, mkv-demuxer).

FrontendPerformancevideo processingWebCodecsDemuxerMatroska
Bilibili Tech
Written by

Bilibili Tech

Provides introductions and tutorials on Bilibili-related technologies.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.