Artificial Intelligence 16 min read

Running AI Inference Directly in the Browser with WebNN

WebNN brings hardware‑accelerated AI inference to web pages, letting developers run millisecond‑level face detection, real‑time filters, and semantic segmentation locally without cloud calls, while improving latency, privacy, and cost through a unified JavaScript API that maps to CPUs, GPUs or NPUs.

大转转FE

May 7, 2026

Running AI Inference Directly in the Browser with WebNN

Introduction

Web Neural Network API (WebNN) enables millisecond‑level inference such as face detection, stylized filters, or semantic segmentation directly in a web page without a cloud API. It provides a hardware‑agnostic abstraction that maps a JavaScript graph to the most suitable accelerator (DirectML, ML Compute, NNAPI, etc.).

Translator‑Scheduler Design

Developers describe a computation with a unified JS API; the browser translates and schedules the graph to the device‑specific backend, combining native‑app performance with web deployment ease.

Key Benefits

Latency reduction – GPU‑accelerated inference runs in‑browser, avoiding data upload and round‑trip delays, enabling real‑time processing of camera or microphone streams.

Privacy protection – All computation stays on the client device, so sensitive media never leaves the user’s hardware.

Usability and cost – Once a model is cached the app works offline; compute load shifts from servers to millions of client devices, reducing server‑side cost.

Comparison with other approaches

Native app: execution on user device, highest performance (direct hardware), highest privacy, poor cross‑platform, complex deployment.

Cloud AI: remote execution, unlimited compute, privacy risk, good cross‑platform, easy deployment.

WebAssembly (Wasm): in‑browser execution, limited performance, high privacy, good cross‑platform, easy deployment.

WebNN: in‑browser execution using hardware adapters, near‑native performance, highest privacy, good cross‑platform, extremely easy deployment (web‑level updates).

Getting Started (Experimental)

WebNN is available in Chromium‑based browsers behind a flag. Enable “Enable WebNN” in edge://flags or chrome://flags, restart, then verify with 'ml' in navigator (should return true).

Core API Workflow

Connect hardware: navigator.ml.createContext() – obtains a compute context.

Build graph: new MLGraphBuilder(context) – creates a graph builder.

Define inputs / load model: builder.input() or builder.import() – specify tensor shapes or import an ONNX model.

Compile: builder.build() – compiles to device‑specific code.

Execute: context.compute() – runs inference and returns results.

Six‑step example

async function runWebNNModel() {
  // 1️⃣ Connect hardware
  const ctx = await navigator.ml.createContext(); // obtain context

  // 2️⃣ Build graph
  const builder = new MLGraphBuilder(ctx);

  // 3️⃣ Load ONNX model
  const modelUrl = 'models/my_model.onnx';
  const response = await fetch(modelUrl);
  const modelData = await response.arrayBuffer();

  // 4️⃣ Import (includes compilation)
  const graph = await builder.import(modelData);

  // 5️⃣ Prepare input tensor
  const inputData = new Float32Array([1, 2, 3, 4]);

  // 6️⃣ Execute inference
  const result = await ctx.compute(graph, {input_name: inputData});
  return result.outputs;
}

Face‑recognition case study

The official sample combines an SSD‑MobileNet V2 face detector with a FaceNet embedding model. It loads ONNX weights, builds the graph, dispatches on GPU, and reads back a Float32Array of 512‑dimensional embeddings.

Four‑layer architecture

WebNN sits between the web‑application layer (JS ML frameworks, ONNX models) and native ML APIs (DirectML, NNAPI, CoreML, OpenVINO). Below are the hardware primitives (CPU, GPU, NPU). The API translates a high‑level graph into calls to these native layers, achieving near‑native performance.

Hardware and browser requirements

Chromium‑based browser (latest Microsoft Edge Beta for GPU, Edge Canary for NPU).

Windows 11 version 21H2 or later.

ONNX Runtime Web ≥ 1.18.

Latest device drivers (e.g., Intel driver 32.0.100.2381 for NPU).

WebNN API details

Connect hardware : navigator.ml.createContext() – obtains a compute context.

Build graph : new MLGraphBuilder(context) – creates a graph builder.

Define inputs / import model : builder.input() or builder.import().

Compile : builder.build() – produces optimized machine code.

Execute : context.compute() – runs the graph with supplied inputs.

Face‑recognition implementation (excerpt)

// Create context targeting GPU
this.context_ = await navigator.ml.createContext({deviceType: 'gpu'});

// Build graph
this.builder_ = new MLGraphBuilder(this.context_);

// Define input tensor
const inputDesc = {
  dataType: 'float32',
  dimensions: this.inputOptions.inputShape,
  shape: this.inputOptions.inputShape,
};
const input = this.builder_.input('input', inputDesc);

// Build a convolution layer (example helper)
const conv0 = this.buildConv_(input, 'Conv2d_1a_3x3', {strides});

// Compile graph
this.graph_ = await this.builder_.build({'output': outputOperand});

// Dispatch inference
this.context_.dispatch(this.graph_, {input}, {output});
const results = await this.context_.readTensor(this.outputTensor_);
return new Float32Array(results);

Typical application scenarios

Person detection (SSD, YOLO) – detect presence in video conferences.

Semantic segmentation (DeepLabv3+, Mask R‑CNN, SegAny) – background replacement or blur.

Pose detection (PoseNet) – hand‑gesture control.

Face recognition (SSD + FaceNet) – identity verification, access control.

Facial landmark detection (FAN) – virtual try‑on, makeup.

Image captioning (im2txt) – accessibility.

Speech‑to‑text (Whisper) – live subtitles.

Text generation (GPT‑2, LLaMA) – chat, summarization.

Future outlook

Enables front‑end developers to embed high‑performance AI without native toolchains.

Shifts compute cost to the client, allowing offline AI experiences.

Provides a privacy‑first execution model.

Accelerates edge‑device innovation, opening new real‑time, interactive web applications.

References

WebNN requirements: https://learn.microsoft.com/zh-cn/windows/ai/directml/webnn-overview#webnn-requirements

Build your first graph with WebNN API: https://webmachinelearning.github.io/get-started/2021/03/15/build-your-first-graph-with-webnn-api.html

Face‑recognition sample: https://github.com/webmachinelearning/webnn-samples/tree/master/face_recognition

Web Neural Network API (W3C Candidate Recommendation): https://www.w3.org/TR/webnn/

WebNN tutorial (Microsoft Learn): https://learn.microsoft.com/en-us/windows/ai/directml/webnn-tutorial

WebNN intro: https://webmachinelearning.github.io/webnn-intro/

WebNN GitHub repository: https://github.com/webmachinelearning/webnn

WebNN developer preview: https://microsoft.github.io/webnn-developer-preview/

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

WebNN AI inference GPU browser Edge TensorFlow.js WebGPU ONNX

Written by

大转转FE

Regularly sharing the team's thoughts and insights on frontend development

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.