Running AI Inference Directly in the Browser with WebNN
WebNN brings hardware‑accelerated AI inference to web pages, letting developers run millisecond‑level face detection, real‑time filters, and semantic segmentation locally without cloud calls, while improving latency, privacy, and cost through a unified JavaScript API that maps to CPUs, GPUs or NPUs.
Introduction
Web Neural Network API (WebNN) enables millisecond‑level inference such as face detection, stylized filters, or semantic segmentation directly in a web page without a cloud API. It provides a hardware‑agnostic abstraction that maps a JavaScript graph to the most suitable accelerator (DirectML, ML Compute, NNAPI, etc.).
Translator‑Scheduler Design
Developers describe a computation with a unified JS API; the browser translates and schedules the graph to the device‑specific backend, combining native‑app performance with web deployment ease.
Key Benefits
Latency reduction – GPU‑accelerated inference runs in‑browser, avoiding data upload and round‑trip delays, enabling real‑time processing of camera or microphone streams.
Privacy protection – All computation stays on the client device, so sensitive media never leaves the user’s hardware.
Usability and cost – Once a model is cached the app works offline; compute load shifts from servers to millions of client devices, reducing server‑side cost.
Comparison with other approaches
Native app: execution on user device, highest performance (direct hardware), highest privacy, poor cross‑platform, complex deployment.
Cloud AI: remote execution, unlimited compute, privacy risk, good cross‑platform, easy deployment.
WebAssembly (Wasm): in‑browser execution, limited performance, high privacy, good cross‑platform, easy deployment.
WebNN: in‑browser execution using hardware adapters, near‑native performance, highest privacy, good cross‑platform, extremely easy deployment (web‑level updates).
Getting Started (Experimental)
WebNN is available in Chromium‑based browsers behind a flag. Enable “Enable WebNN” in edge://flags or chrome://flags, restart, then verify with 'ml' in navigator (should return true).
Core API Workflow
Connect hardware: navigator.ml.createContext() – obtains a compute context.
Build graph: new MLGraphBuilder(context) – creates a graph builder.
Define inputs / load model: builder.input() or builder.import() – specify tensor shapes or import an ONNX model.
Compile: builder.build() – compiles to device‑specific code.
Execute: context.compute() – runs inference and returns results.
Six‑step example
async function runWebNNModel() {
// 1️⃣ Connect hardware
const ctx = await navigator.ml.createContext(); // obtain context
// 2️⃣ Build graph
const builder = new MLGraphBuilder(ctx);
// 3️⃣ Load ONNX model
const modelUrl = 'models/my_model.onnx';
const response = await fetch(modelUrl);
const modelData = await response.arrayBuffer();
// 4️⃣ Import (includes compilation)
const graph = await builder.import(modelData);
// 5️⃣ Prepare input tensor
const inputData = new Float32Array([1, 2, 3, 4]);
// 6️⃣ Execute inference
const result = await ctx.compute(graph, {input_name: inputData});
return result.outputs;
}Face‑recognition case study
The official sample combines an SSD‑MobileNet V2 face detector with a FaceNet embedding model. It loads ONNX weights, builds the graph, dispatches on GPU, and reads back a Float32Array of 512‑dimensional embeddings.
Four‑layer architecture
WebNN sits between the web‑application layer (JS ML frameworks, ONNX models) and native ML APIs (DirectML, NNAPI, CoreML, OpenVINO). Below are the hardware primitives (CPU, GPU, NPU). The API translates a high‑level graph into calls to these native layers, achieving near‑native performance.
Hardware and browser requirements
Chromium‑based browser (latest Microsoft Edge Beta for GPU, Edge Canary for NPU).
Windows 11 version 21H2 or later.
ONNX Runtime Web ≥ 1.18.
Latest device drivers (e.g., Intel driver 32.0.100.2381 for NPU).
WebNN API details
Connect hardware : navigator.ml.createContext() – obtains a compute context.
Build graph : new MLGraphBuilder(context) – creates a graph builder.
Define inputs / import model : builder.input() or builder.import().
Compile : builder.build() – produces optimized machine code.
Execute : context.compute() – runs the graph with supplied inputs.
Face‑recognition implementation (excerpt)
// Create context targeting GPU
this.context_ = await navigator.ml.createContext({deviceType: 'gpu'});
// Build graph
this.builder_ = new MLGraphBuilder(this.context_);
// Define input tensor
const inputDesc = {
dataType: 'float32',
dimensions: this.inputOptions.inputShape,
shape: this.inputOptions.inputShape,
};
const input = this.builder_.input('input', inputDesc);
// Build a convolution layer (example helper)
const conv0 = this.buildConv_(input, 'Conv2d_1a_3x3', {strides});
// Compile graph
this.graph_ = await this.builder_.build({'output': outputOperand});
// Dispatch inference
this.context_.dispatch(this.graph_, {input}, {output});
const results = await this.context_.readTensor(this.outputTensor_);
return new Float32Array(results);Typical application scenarios
Person detection (SSD, YOLO) – detect presence in video conferences.
Semantic segmentation (DeepLabv3+, Mask R‑CNN, SegAny) – background replacement or blur.
Pose detection (PoseNet) – hand‑gesture control.
Face recognition (SSD + FaceNet) – identity verification, access control.
Facial landmark detection (FAN) – virtual try‑on, makeup.
Image captioning (im2txt) – accessibility.
Speech‑to‑text (Whisper) – live subtitles.
Text generation (GPT‑2, LLaMA) – chat, summarization.
Future outlook
Enables front‑end developers to embed high‑performance AI without native toolchains.
Shifts compute cost to the client, allowing offline AI experiences.
Provides a privacy‑first execution model.
Accelerates edge‑device innovation, opening new real‑time, interactive web applications.
References
WebNN requirements: https://learn.microsoft.com/zh-cn/windows/ai/directml/webnn-overview#webnn-requirements
Build your first graph with WebNN API: https://webmachinelearning.github.io/get-started/2021/03/15/build-your-first-graph-with-webnn-api.html
Face‑recognition sample: https://github.com/webmachinelearning/webnn-samples/tree/master/face_recognition
Web Neural Network API (W3C Candidate Recommendation): https://www.w3.org/TR/webnn/
WebNN tutorial (Microsoft Learn): https://learn.microsoft.com/en-us/windows/ai/directml/webnn-tutorial
WebNN intro: https://webmachinelearning.github.io/webnn-intro/
WebNN GitHub repository: https://github.com/webmachinelearning/webnn
WebNN developer preview: https://microsoft.github.io/webnn-developer-preview/
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
大转转FE
Regularly sharing the team's thoughts and insights on frontend development
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
