Building a Music Recognition Chrome Extension with Manifest V3 and WebAssembly
The article explains how NetEase Cloud Music built a Chrome extension that captures tab audio, processes it with an AudioWorkletNode, extracts fingerprints via WebAssembly in a sandboxed iframe, and matches songs locally, all while navigating Manifest V3’s service‑worker, CSP, and deprecation constraints.
This article details the technical implementation of a music recognition Chrome extension developed by NetEase Cloud Music. The extension solves the problem of identifying background music while browsing video websites, eliminating the need to use a separate phone app for song recognition.
Background and Motivation
Most existing music recognition extensions in the Chrome Web Store are foreign-developed with poor support for Chinese music. Additionally, most existing plugins use the older Manifest V2 protocol, which is being deprecated in 2023. The authors aimed to implement this using Manifest V3 (MV3), which offers better security, performance, and privacy, while also moving audio fingerprint extraction from the server to the client side to reduce server load.
Manifest V3 Protocol Changes
MV3 introduces several significant changes that affect extension development: Background Pages are replaced with Service Workers (limiting Web API access), remote code execution is no longer supported (all code must be bundled), and Content Security Policy restrictions prevent direct execution of unsafe code including WebAssembly initialization.
Audio Extraction in Browser Extensions
The extension uses chrome.tabCapture API to capture audio from web pages. For audio processing, the article compares three approaches: createScriptProcessor (deprecated), MediaRecorder (lacks fine-grained control), and AudioWorkletNode (chosen for its ability to process audio bit-by-bit without blocking the main thread).
Implementation involves three steps:
Module registration: const audio_ctx = new window.AudioContext({sampleRate: 8000}); await audio_ctx.audioWorklet.addModule("PitchProcessor.js");
Creating AudioWorkletNode to receive data from the WebAudio thread via port.message
Processing audio in AudioWorkletProcessor.process - collecting samples from the first channel until reaching the defined length (e.g., 48000), then notifying the main thread
Audio Fingerprint Extraction
After extracting audio signals, fingerprint extraction involves converting the binary data to frequency domain information using Fourier transform. Common methods include energy-based, landmark-based, and neural network-based fingerprinting. The article references the paper "A Highly Robust Audio Fingerprinting System" for algorithm details.
WebAssembly is used for better CPU performance, but MV3's strict CSP prevents direct WASM initialization. The solution uses a sandbox page, which can run "unsafe" methods like eval , new Function , and WebAssembly.instantiate . Communication between the main page and sandbox is done via an iframe.
Feature Matching
The extracted audio fingerprint is matched against a fingerprint database (implemented as a hash table) to retrieve the matching song ID and timestamp. Different companies have varying algorithms affecting matching efficiency and accuracy.
Conclusion
While extensions offer flexibility, Google's migration to MV3 brings security and privacy improvements but also restricts many features. After 2023, many extensions using MV2 will cease to work.
NetEase Cloud Music Tech Team
Official account of NetEase Cloud Music Tech Team
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.