Amplifier Story

Four Prompts to
Serverless AI

How a casual question became a fully browser-based AI application

January 2026

Experimental

The Question

What if you could run an LLM
entirely in the browser?

No API keys. No cloud costs. No server to maintain.
Just open a webpage and start chatting.

1

The Initial Vision

I want you to have your agents go do some investigation on the latest on webgpu and running mid-sized llms that might be useful for Amplifier-powered experiences, like something that has a bunch of text context that we've collected and being able to chat about the contents of it, but run in the browser.

Four research agents dispatched in parallel

The Research

Agents returned with answers

WebLLM

Best framework for browser inference

Qwen3 4B

Optimal model for 4GB VRAM

~70

Tokens/sec on modern GPU

IndexedDB

For persistent model caching

2

The Expansion

What about running amplifier in the same browser session too, so that we can have a web app that we can go to and not run any local services - can you research running the python in that web app too?

Scope expanded: not just LLM, but the entire agent framework

The Architecture

A fully serverless AI stack

Your Web Browser

↓

Pyodide (Python in WebAssembly)

↓

Amplifier Agent Framework

↓

WebLLM + WebGPU

↓

Local Model (Qwen3 4B)

3

The Green Light

Yeah, I like this, why don't you go ahead and PoC this..., #2

13 words that launched the implementation

The Build

From research to running code

// The result: a single HTML file that runs AI
const engine = await CreateMLCEngine("Qwen3-4B");
const pyodide = await loadPyodide();
await pyodide.runPythonAsync(`
    from amplifier_browser import create_session
    session = create_session(provider=webgpu)
    response = await session.execute("Hello!")
`);

4

Going Further

Ok, that worked, awesome! Ok, next steps, let's add amplifier under the hood of this now, running in that browser env.

From proof-of-concept to production integration

The Meta Story

Amplifier agents researched, designed,
and built an app that runs Amplifier

The framework built its own browser-native version of itself.

Sources & Methodology

Research & Attribution

Data as of

February 20, 2026

Feature Status

Experimental — WebGPU browser bundles are cutting-edge, not production

Repositories

microsoft/amplifier-bundle-webruntime, microsoft/amplifier-bundle-webllm

Research

Bundle analysis, Pyodide integration verification, WebGPU compatibility testing

Technology Stack

Pyodide (Python in WebAssembly), WebLLM (@mlc-ai/web-llm v0.2.79), WebGPU API

Models Supported

Phi 3.5 Mini (2.2GB), Llama 3.2 3B (1.8GB), Qwen 2.5 1.5B (1.0GB), Llama 3.2 1B (0.7GB)

Primary Contributor

Diego Colombo

Gaps

Performance benchmarks not independently measured; model quality depends on hardware GPU

The Result

Four prompts.
One session.
Serverless AI.

From "I wonder if..." to working code in a single conversation.

Built with Amplifier

Four Prompts toServerless AI

What if you could run an LLMentirely in the browser?