Browser‑Based Video Synthesis Using FFmpeg and WebAssembly
The article details how to compile FFmpeg to WebAssembly and integrate it into a browser‑based video synthesis platform, describing the runtime architecture, JSON‑driven API, key‑frame animation mapping, memory‑limit strategies, text rendering options, and future enhancements such as OPFS, SIMD, and WebGL acceleration.
This article describes the design and implementation of a web‑frontend video synthesis capability built on FFmpeg compiled to WebAssembly. It explains the background, technical choices, compilation process, runtime architecture, API design, animation support, memory‑management strategies, and future directions.
Background : Video synthesis involves decoding/encoding audio‑video streams, handling multiple formats, and performing operations such as filtering, clipping, and concatenation. Traditional desktop editors are now being replicated on the web using modern browser APIs.
Technical selection : FFmpeg is chosen for its comprehensive media processing features. To run it in the browser, the project uses WebAssembly (Wasm) together with Emscripten, which compiles C/C++ code to Wasm and provides a virtual file system (MEMFS) for file I/O.
Compiling FFmpeg to WebAssembly : The large FFmpeg codebase is split into libav libraries, third‑party libraries, and the ffmpeg command‑line tools (fftools). The team uses the community‑maintained ffmpeg.wasm project and follows its guide for custom builds.
Running FFmpeg in the browser : Commands are passed to the Emscripten module after pre‑loading input files into MEMFS. The module executes the command and writes the output back to MEMFS, from which the JavaScript layer retrieves the result.
ffmpeg -i input.mp4 -vf scale=w='0.5*iw':h='0.5*ih' -t 5 output.mp4
API design : Business code describes a video project with a JSON object. The JSON contains static properties (position, size, opacity, etc.) and an animation section for key‑frame‑based animations. The system builds a three‑layer architecture:
Business layer – JSON description supplied by the client.
State layer – a fully populated object tree (including default values) used for translation.
Execution layer – translates the state tree into FFmpeg command‑line arguments and runs them.
Execution flow (simplified):
Validate and complete the JSON using a schema.
Pre‑load all media resources and extract metadata.
Generate a project object tree.
Translate the tree into FFmpeg arguments (input files, filtergraph, output path).
Run the task in a queue, emit progress events, and return logs, binary output, and an event emitter.
export interface RunTaskResult { log: LogNode; output: Uint8Array; }
function runProject(json: ProjectJson) { const evt = new EventEmitter (); const steps = async () => { await Promise.resolve(); const parsedJson = ProjectSchema.parse(json); evt.emit('preload_all_start'); const preloadedClips = [ ...await preloadAllResourceClips(parsedJson, evt), ...await preloadAllTextClips(parsedJson) ]; const subtitleInfo = await preloadSubtitle(parsedJson, evt); evt.emit('preload_all_end'); const projectObj = initProject(parsedJson, preloadedClips); const { fsOutputPath, fsInputs, args } = parseProject(projectObj, parsedJson, preloadedClips, subtitleInfo); if (subtitleInfo.hasSubtitle) { fsInputs.push(subtitleInfo.srtInfo!, subtitleInfo.fontInfo!); } const task: FFmpegTask = { fsOutputPath, fsInputs, args }; task.logHandler = (log) => { const p = getProgressFromLog(log, project.timeline.end); if (p !== undefined) { evt.emit('progress', p); } }; evt.emit('start'); const res = runInQueue(task); await res; evt.emit('end'); return res; }; return { evt, result: steps() }; }
Animation support : The system maps key‑frame animations to FFmpeg filter expressions. Simple linear interpolation uses lerp , while non‑linear easing uses custom expressions built with if and pow . A helper nestedIfElse recursively composes multiple branches.
function nestedIfElse(branches: string[], predicates: string[]) { if (branches.length === 1) { return branches[0]; } else if (branches.length === 2) { const predicate = predicates[0]; const [ifBranch, elseBranch] = branches; return `if(${predicate},${ifBranch},${elseBranch})`; } else { const predicate = predicates.shift(); const ifBranch = branches.shift(); const elseBranch = nestedIfElse(branches, predicates) as string; return `if(${predicate},${ifBranch},${elseBranch})`; } }
For easing:
function easeInOut(t1: number, v1: number, t2: number, v2: number) { const t = `t-${t1})/(${t2-t1})`; const tp = `if(lt(${t},0.5),4*pow(${t},3),1-pow(-2*${t}+2,3)/2)`; return `lerp(${v1},${v2},${tp})`; }
Example JSON for an animated clip:
{ "type": "video", "url": "/bg.mp4", "static": { "x": 100, "y": 100 }, "animation": { "properties": { "delay": 1, "duration": 5 }, "keyframes": { "0": { "opacity": 0 }, "50": { "opacity": 1 }, "100": { "opacity": 0 } } } }
Memory‑limit handling : Running FFmpeg.wasm can trigger OOM errors. The authors mitigate this by:
Serializing tasks through a single‑threaded queue and restarting the runtime between tasks.
Splitting long timelines into smaller segments so each FFmpeg invocation handles only 2‑3 inputs.
Using stream copy ( -vcodec copy ) for sections that require no re‑encoding.
Example OOM error:
exception thrown: RuntimeError: abort (00M). Build with -s ASSERTIONS=1 for more info.
Text rendering : Two approaches are discussed – converting text to images via Canvas/SVG and using FFmpeg’s native drawtext or subtitles filters. The latter reduces memory usage for large subtitle blocks.
Future work : Anticipated improvements include using the Origin‑Private File System (OPFS) for faster I/O, leveraging WebAssembly SIMD for parallel processing, and employing WebGL for GPU‑accelerated filters.
Bilibili Tech
Provides introductions and tutorials on Bilibili-related technologies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.