Announcement

WebGPU and WebNN: The APIs Making Browser AI Possible

How two web standards are enabling on-device machine learning — and why that matters for privacy.

6 min read

Our transcription tool runs AI models entirely in your browser. No servers, no uploads, no "trust us with your data." But how is that even possible?

Two browser APIs make it work: WebGPU and WebNN. Here's what they are, where they're supported, and why they matter.

The Problem: JavaScript Is Slow

Traditional JavaScript is single-threaded and runs on the CPU. That's fine for most web apps, but machine learning models need to perform billions of calculations. Matrix multiplications, tensor operations, neural network inference — all computationally intensive.

Running a speech recognition model in plain JavaScript would take minutes for a few seconds of audio. Unusable.

Solution 1: WebGPU

WebGPU is a modern graphics API that gives JavaScript access to your GPU. GPUs are designed for parallel computation — exactly what ML models need.

January 2026 marked a milestone: WebGPU is now supported by all major browsers, ending the 15-year WebGL era for high-performance graphics and compute.

What it does

  • Direct access to GPU compute capabilities
  • Runs thousands of operations in parallel
  • 10-100x faster than CPU for ML workloads

How we use it

When you load a transcription model on ddevtools, we check for WebGPU support:

typescript
const adapter = await navigator.gpu?.requestAdapter();
if (adapter) {
  // Use GPU acceleration
  device = "webgpu";
} else {
  // Fall back to WASM
  device = "wasm";
}

If your browser supports WebGPU, model inference runs on your graphics card. A 30-second audio clip that would take 60+ seconds on CPU takes ~5-10 seconds with WebGPU.

Browser support (January 2026)

BrowserWebGPU Support
Chrome✅ Full support (v113+)
Edge✅ Full support (v113+)
Firefox✅ Enabled by default (v147+, Windows/macOS ARM)
Safari✅ Full support (v26+, macOS/iOS/iPadOS/visionOS)
Chrome Android✅ Supported (v144+, Android 12+)
iOS Safari✅ Supported (v26+)
Linux⚠️ Requires flag in Chrome/Edge

Global coverage: ~77% according to Can I Use. The remaining 30% still get WASM fallback.

Bottom line: WebGPU is now production-ready across all major browsers. This is a big deal.

Solution 2: WebNN (Web Neural Network API)

WebNN is a newer API specifically designed for neural network inference. While WebGPU is general-purpose GPU compute, WebNN is optimized for ML and can tap into dedicated AI hardware.

The W3C specification reached Candidate Recommendation status in January 2026, with ongoing work to support generative AI use cases.

What it does

  • Hardware-accelerated neural network operations
  • Can use GPU, NPU (Neural Processing Unit), or CPU
  • Optimized primitives for common ML operations (convolution, pooling, etc.)
  • Direct access to platform ML accelerators (DirectML on Windows, Core ML on Apple, etc.)

Why it matters

Modern laptops and phones increasingly have dedicated ML hardware:

  • Apple's Neural Engine
  • Intel's NPU
  • Qualcomm's AI Engine

WebNN can tap into this specialized hardware. A model that runs at 10x speed on WebGPU might run at 50x on a dedicated NPU via WebNN.

Browser support (January 2026)

BrowserWebNN Support
Chrome/Edge (Windows)⚠️ Behind flag, CPU backend works
Chrome/Edge (other)⚠️ Limited, requires flags
Firefox❌ Not supported
Safari❌ Not supported

According to Microsoft's WebNN documentation, GPU and NPU support remain in preview. Full support requires Windows 11 24H2+ with specific flags enabled.

Bottom line: WebNN is not ready for production. The spec is maturing and Chrome/Edge have experimental support, but cross-browser deployment isn't viable yet. We're watching it closely.

The Fallback: WebAssembly (WASM)

When neither WebGPU nor WebNN is available, we fall back to WebAssembly. WASM lets you run compiled code (C++, Rust) in the browser at near-native speed.

It's slower than GPU acceleration (3-5x), but it works everywhere. Every modern browser supports WASM.

Performance hierarchy:
WebNN (NPU) > WebGPU (GPU) > WASM (CPU) > Plain JavaScript
    50x           10x           3x             1x

What This Means for Privacy

These APIs enable a fundamental shift: compute can happen on your device instead of someone else's server.

Before WebGPU/WebNN:

  • ML required server-side processing
  • Your data had to leave your device
  • "Privacy" meant trusting the service provider

After WebGPU/WebNN:

  • ML can run in your browser
  • Data never leaves your device
  • Privacy is architectural, not policy-based

This is why we built our transcription tool the way we did. The technology finally exists to do speech recognition without uploading your audio anywhere.

Current Limitations

It's not all perfect:

Model size

Browser-friendly models are typically "tiny" or "small" variants. Our transcription uses Whisper Tiny (75MB) and Moonshine Tiny (50MB). The full Whisper Large model is 3GB — not practical for browser download.

Memory constraints

Browsers limit how much memory a tab can use. Mobile browsers are especially restrictive (~256-512MB), which is why our transcription tool doesn't work on phones.

Cold start

First-time users need to download the model (50-75MB). We cache it for subsequent visits, but that initial download is unavoidable.

Battery and heat

Running ML inference is computationally intensive. On laptops, expect fan spin-up and battery drain during processing.

What's Next

WebGPU reaching full browser support in early 2026 was a major milestone. Here's what we're watching:

WebNN maturation — The spec is solidifying and Chrome/Edge have working implementations. Once Firefox and Safari adopt it, we'll see even faster inference on devices with NPUs.

Smaller models — Researchers are making smaller models that maintain accuracy. A 10MB transcription model would be game-changing for mobile browsers.

Quantization improvements — Running models in 4-bit or 8-bit precision reduces memory and speeds up inference with minimal accuracy loss. Libraries like Transformers.js already support this.

More model types — Translation, summarization, image recognition, code completion — all becoming browser-viable as WebGPU coverage expands.

Try It

Our Private Transcription tool uses WebGPU with WASM fallback. You can see which one your browser is using — we show "WebGPU acceleration enabled" or "Using WASM" based on your device.

With ~77% of browsers now supporting WebGPU, most users get GPU-accelerated transcription automatically. The model runs in a Web Worker thread, so it won't freeze your browser. Your audio never leaves your device.

That's the present we're building in: powerful tools that respect your privacy by design.

Try Private Transcription

Want to learn more? The WebGPU spec and WebNN spec are readable for W3C documents. For compatibility details, check Can I Use WebGPU.