Our transcription tool runs AI models entirely in your browser. No servers, no uploads, no "trust us with your data." But how is that even possible?
Two browser APIs make it work: WebGPU and WebNN. Here's what they are, where they're supported, and why they matter.
The Problem: JavaScript Is Slow
Traditional JavaScript is single-threaded and runs on the CPU. That's fine for most web apps, but machine learning models need to perform billions of calculations. Matrix multiplications, tensor operations, neural network inference — all computationally intensive.
Running a speech recognition model in plain JavaScript would take minutes for a few seconds of audio. Unusable.
Solution 1: WebGPU
WebGPU is a modern graphics API that gives JavaScript access to your GPU. GPUs are designed for parallel computation — exactly what ML models need.
January 2026 marked a milestone: WebGPU is now supported by all major browsers, ending the 15-year WebGL era for high-performance graphics and compute.
What it does
- Direct access to GPU compute capabilities
- Runs thousands of operations in parallel
- 10-100x faster than CPU for ML workloads
How we use it
When you load a transcription model on ddevtools, we check for WebGPU support:
const adapter = await navigator.gpu?.requestAdapter();
if (adapter) {
// Use GPU acceleration
device = "webgpu";
} else {
// Fall back to WASM
device = "wasm";
}
If your browser supports WebGPU, model inference runs on your graphics card. A 30-second audio clip that would take 60+ seconds on CPU takes ~5-10 seconds with WebGPU.
Browser support (January 2026)
| Browser | WebGPU Support |
|---|---|
| Chrome | ✅ Full support (v113+) |
| Edge | ✅ Full support (v113+) |
| Firefox | ✅ Enabled by default (v147+, Windows/macOS ARM) |
| Safari | ✅ Full support (v26+, macOS/iOS/iPadOS/visionOS) |
| Chrome Android | ✅ Supported (v144+, Android 12+) |
| iOS Safari | ✅ Supported (v26+) |
| Linux | ⚠️ Requires flag in Chrome/Edge |
Global coverage: ~77% according to Can I Use. The remaining 30% still get WASM fallback.
Bottom line: WebGPU is now production-ready across all major browsers. This is a big deal.
Solution 2: WebNN (Web Neural Network API)
WebNN is a newer API specifically designed for neural network inference. While WebGPU is general-purpose GPU compute, WebNN is optimized for ML and can tap into dedicated AI hardware.
The W3C specification reached Candidate Recommendation status in January 2026, with ongoing work to support generative AI use cases.
What it does
- Hardware-accelerated neural network operations
- Can use GPU, NPU (Neural Processing Unit), or CPU
- Optimized primitives for common ML operations (convolution, pooling, etc.)
- Direct access to platform ML accelerators (DirectML on Windows, Core ML on Apple, etc.)
Why it matters
Modern laptops and phones increasingly have dedicated ML hardware:
- Apple's Neural Engine
- Intel's NPU
- Qualcomm's AI Engine
WebNN can tap into this specialized hardware. A model that runs at 10x speed on WebGPU might run at 50x on a dedicated NPU via WebNN.
Browser support (January 2026)
| Browser | WebNN Support |
|---|---|
| Chrome/Edge (Windows) | ⚠️ Behind flag, CPU backend works |
| Chrome/Edge (other) | ⚠️ Limited, requires flags |
| Firefox | ❌ Not supported |
| Safari | ❌ Not supported |
According to Microsoft's WebNN documentation, GPU and NPU support remain in preview. Full support requires Windows 11 24H2+ with specific flags enabled.
Bottom line: WebNN is not ready for production. The spec is maturing and Chrome/Edge have experimental support, but cross-browser deployment isn't viable yet. We're watching it closely.
The Fallback: WebAssembly (WASM)
When neither WebGPU nor WebNN is available, we fall back to WebAssembly. WASM lets you run compiled code (C++, Rust) in the browser at near-native speed.
It's slower than GPU acceleration (3-5x), but it works everywhere. Every modern browser supports WASM.
Performance hierarchy:
WebNN (NPU) > WebGPU (GPU) > WASM (CPU) > Plain JavaScript
50x 10x 3x 1x
What This Means for Privacy
These APIs enable a fundamental shift: compute can happen on your device instead of someone else's server.
Before WebGPU/WebNN:
- ML required server-side processing
- Your data had to leave your device
- "Privacy" meant trusting the service provider
After WebGPU/WebNN:
- ML can run in your browser
- Data never leaves your device
- Privacy is architectural, not policy-based
This is why we built our transcription tool the way we did. The technology finally exists to do speech recognition without uploading your audio anywhere.
Current Limitations
It's not all perfect:
Model size
Browser-friendly models are typically "tiny" or "small" variants. Our transcription uses Whisper Tiny (75MB) and Moonshine Tiny (50MB). The full Whisper Large model is 3GB — not practical for browser download.
Memory constraints
Browsers limit how much memory a tab can use. Mobile browsers are especially restrictive (~256-512MB), which is why our transcription tool doesn't work on phones.
Cold start
First-time users need to download the model (50-75MB). We cache it for subsequent visits, but that initial download is unavoidable.
Battery and heat
Running ML inference is computationally intensive. On laptops, expect fan spin-up and battery drain during processing.
What's Next
WebGPU reaching full browser support in early 2026 was a major milestone. Here's what we're watching:
WebNN maturation — The spec is solidifying and Chrome/Edge have working implementations. Once Firefox and Safari adopt it, we'll see even faster inference on devices with NPUs.
Smaller models — Researchers are making smaller models that maintain accuracy. A 10MB transcription model would be game-changing for mobile browsers.
Quantization improvements — Running models in 4-bit or 8-bit precision reduces memory and speeds up inference with minimal accuracy loss. Libraries like Transformers.js already support this.
More model types — Translation, summarization, image recognition, code completion — all becoming browser-viable as WebGPU coverage expands.
Try It
Our Private Transcription tool uses WebGPU with WASM fallback. You can see which one your browser is using — we show "WebGPU acceleration enabled" or "Using WASM" based on your device.
With ~77% of browsers now supporting WebGPU, most users get GPU-accelerated transcription automatically. The model runs in a Web Worker thread, so it won't freeze your browser. Your audio never leaves your device.
That's the present we're building in: powerful tools that respect your privacy by design.
Try Private TranscriptionWant to learn more? The WebGPU spec and WebNN spec are readable for W3C documents. For compatibility details, check Can I Use WebGPU.