More Amplifier Stories
Amplifier Story

Four Prompts to
Serverless AI

How a casual question became a fully browser-based AI application

January 2026
Experimental
The Question

What if you could run an LLM
entirely in the browser?

No API keys. No cloud costs. No server to maintain.
Just open a webpage and start chatting.

1
The Initial Vision

I want you to have your agents go do some investigation on the latest on webgpu and running mid-sized llms that might be useful for Amplifier-powered experiences, like something that has a bunch of text context that we've collected and being able to chat about the contents of it, but run in the browser.

Four research agents dispatched in parallel

The Research

Agents returned with answers

WebLLM
Best framework for browser inference
Qwen3 4B
Optimal model for 4GB VRAM
~70
Tokens/sec on modern GPU
IndexedDB
For persistent model caching
2
The Expansion

What about running amplifier in the same browser session too, so that we can have a web app that we can go to and not run any local services - can you research running the python in that web app too?

Scope expanded: not just LLM, but the entire agent framework

The Architecture

A fully serverless AI stack

Your Web Browser
Pyodide (Python in WebAssembly)
Amplifier Agent Framework
WebLLM + WebGPU
Local Model (Qwen3 4B)
3
The Green Light

Yeah, I like this, why don't you go ahead and PoC this..., #2

13 words that launched the implementation

The Build

From research to running code

// The result: a single HTML file that runs AI const engine = await CreateMLCEngine("Qwen3-4B"); const pyodide = await loadPyodide(); await pyodide.runPythonAsync(` from amplifier_browser import create_session session = create_session(provider=webgpu) response = await session.execute("Hello!") `);
4
Going Further

Ok, that worked, awesome! Ok, next steps, let's add amplifier under the hood of this now, running in that browser env.

From proof-of-concept to production integration

The Meta Story

Amplifier agents researched, designed,
and built an app that runs Amplifier

The framework built its own browser-native version of itself.

Sources & Methodology

Research & Attribution

Data as of
February 20, 2026
Feature Status
Experimental — WebGPU browser bundles are cutting-edge, not production
Repositories
microsoft/amplifier-bundle-webruntime, microsoft/amplifier-bundle-webllm
Research
Bundle analysis, Pyodide integration verification, WebGPU compatibility testing
Technology Stack
Pyodide (Python in WebAssembly), WebLLM (@mlc-ai/web-llm v0.2.79), WebGPU API
Models Supported
Phi 3.5 Mini (2.2GB), Llama 3.2 3B (1.8GB), Qwen 2.5 1.5B (1.0GB), Llama 3.2 1B (0.7GB)
Primary Contributor
Diego Colombo
Gaps
Performance benchmarks not independently measured; model quality depends on hardware GPU
The Result

Four prompts.
One session.
Serverless AI.

From "I wonder if..." to working code in a single conversation.

Built with Amplifier

Want to discuss this?

Powered by WebGPU • Runs entirely in your browser

Four Prompts to Serverless AI Loading...
Preparing Local AI...
Check WebGPU
Load AI Engine
Download Model 12%
Ready to Chat