Frontend Development 12 min read

Building a Music Recognition Chrome Extension with Manifest V3 and WebAssembly

The article explains how NetEase Cloud Music built a Chrome extension that captures tab audio, processes it with an AudioWorkletNode, extracts fingerprints via WebAssembly in a sandboxed iframe, and matches songs locally, all while navigating Manifest V3’s service‑worker, CSP, and deprecation constraints.

NetEase Cloud Music Tech Team
NetEase Cloud Music Tech Team
NetEase Cloud Music Tech Team
Building a Music Recognition Chrome Extension with Manifest V3 and WebAssembly

This article details the technical implementation of a music recognition Chrome extension developed by NetEase Cloud Music. The extension solves the problem of identifying background music while browsing video websites, eliminating the need to use a separate phone app for song recognition.

Background and Motivation

Most existing music recognition extensions in the Chrome Web Store are foreign-developed with poor support for Chinese music. Additionally, most existing plugins use the older Manifest V2 protocol, which is being deprecated in 2023. The authors aimed to implement this using Manifest V3 (MV3), which offers better security, performance, and privacy, while also moving audio fingerprint extraction from the server to the client side to reduce server load.

Manifest V3 Protocol Changes

MV3 introduces several significant changes that affect extension development: Background Pages are replaced with Service Workers (limiting Web API access), remote code execution is no longer supported (all code must be bundled), and Content Security Policy restrictions prevent direct execution of unsafe code including WebAssembly initialization.

Audio Extraction in Browser Extensions

The extension uses chrome.tabCapture API to capture audio from web pages. For audio processing, the article compares three approaches: createScriptProcessor (deprecated), MediaRecorder (lacks fine-grained control), and AudioWorkletNode (chosen for its ability to process audio bit-by-bit without blocking the main thread).

Implementation involves three steps:

Module registration: const audio_ctx = new window.AudioContext({sampleRate: 8000}); await audio_ctx.audioWorklet.addModule("PitchProcessor.js");

Creating AudioWorkletNode to receive data from the WebAudio thread via port.message

Processing audio in AudioWorkletProcessor.process - collecting samples from the first channel until reaching the defined length (e.g., 48000), then notifying the main thread

Audio Fingerprint Extraction

After extracting audio signals, fingerprint extraction involves converting the binary data to frequency domain information using Fourier transform. Common methods include energy-based, landmark-based, and neural network-based fingerprinting. The article references the paper "A Highly Robust Audio Fingerprinting System" for algorithm details.

WebAssembly is used for better CPU performance, but MV3's strict CSP prevents direct WASM initialization. The solution uses a sandbox page, which can run "unsafe" methods like eval , new Function , and WebAssembly.instantiate . Communication between the main page and sandbox is done via an iframe.

Feature Matching

The extracted audio fingerprint is matched against a fingerprint database (implemented as a hash table) to retrieve the matching song ID and timestamp. Different companies have varying algorithms affecting matching efficiency and accuracy.

Conclusion

While extensions offer flexibility, Google's migration to MV3 brings security and privacy improvements but also restricts many features. After 2023, many extensions using MV2 will cease to work.

WebAssemblyChrome ExtensionCSPMusic RecognitionAudioWorkletAudio FingerprintingBrowser PluginManifest V3
NetEase Cloud Music Tech Team
Written by

NetEase Cloud Music Tech Team

Official account of NetEase Cloud Music Tech Team

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.