More Amplifier Stories

From Zero to Real-Time AI
in Two Weeks

A WebRTC voice assistant built on Amplifier.
Three architectures. 30 commits. One developer.

Brian Krabach · January 30 – February 2026
Active

“Can Amplifier
do voice?”

Voice is the interface everyone asks about first.
Not chat. Not IDE. Voice.

The challenge: voice AI isn't a feature you bolt on. It needs real-time streaming, sub-second latency, and an architecture that lets an AI think while it talks.

Yes. And it's novel.

Not a wrapper around a speech API. A voice-first AI that delegates to a team of specialists.

🎤
You speak
🌐
WebRTC
🧠
GPT Realtime
Amplifier
🤖
Agent Team
Explore code
"Look at the auth module and tell me what you find"
Write code
"Add a caching layer to the API endpoint"
Debug
"Why is the test for session forking failing?"
Research
"Find the latest docs on WebRTC data channels"
Architect
"Design a retry strategy for flaky connections"
Ship
"Commit this and create a PR"

Three Architectures in Four Days

Each pivot was a learning moment. Each one made the system fundamentally better.

Day 1
Pivot 1 Many Tools → Orchestration Only
Started with flight tracking, weather, multiple tools exposed to the voice model. Realized: a voice model with limited context shouldn't run tools directly. Stripped everything. Left only task.
Day 3
Pivot 2 Task Tool → Delegate Tool
task was fire-and-forget. Switched to delegate for context control, session resumption, and provider selection. Multi-turn agent conversations became possible.
Day 3
Pivot 3 Auto-Respond → Manual Response Control
Default behavior: model speaks immediately after detecting silence. New behavior: the model decides WHEN to speak, not just what to say. First attempt reverted. Second attempt succeeded.
1

The Voice Model Is an Orchestrator,
Not an Executor

Before
// Voice model had direct access to everything tools: [ "flight_tracker", "weather", "calendar", "todo", "task", "web_search", ... ]

Too many tools. Limited context window. Model confused about when to use what.

After
// Voice model has ONE job: orchestrate REALTIME_TOOLS = {"task"} // All real work delegated to specialists // explorer, architect, builder, bug-hunter...

One tool. Clear purpose. Delegates everything to the right specialist agent.

Key insight: A real-time voice model has a tiny context window. Don't make it think about tools. Make it think about who to ask.
2

From Fire-and-Forget
to Conversation

task created one-shot agents. delegate enables multi-turn dialogues with persistent state.

Context Control
// Agent gets exactly the right context delegate({ agent: "foundation:explorer", context_depth: "recent", // none | recent | all context_turns: 5 })
Session Resumption
// Resume a prior agent conversation delegate({ agent: "foundation:modular-builder", session_id: "abc-123..." // pick up where we left off })
Provider Selection
// Voice = GPT Realtime, Agents = Claude provider_preferences: [ { provider: "anthropic", model: "claude-sonnet-*" } ]
Specialist Routing
// Right agent for the right job "foundation:explorer" // scan code "foundation:bug-hunter" // debug "foundation:zen-architect" // design "foundation:git-ops" // ship
3

The Model Decides
When to Speak

Default voice AI: detect silence → auto-respond.
Amplifier Voice: detect silence → model chooses its moment.

// Semantic VAD + manual response control session.update({ audio: { input: { turn_detection: { type: "semantic_vad", eagerness: "low", create_response: false, // KEY interrupt_response: true } } } }) // After transcription, client triggers: dataChannel.send( JSON.stringify({ type: "response.create" }) )
First attempt: reverted
"True silence mode" was too aggressive. Broke natural conversation flow. Commit fa859f0 → Revert 78441f5
Second attempt: shipped
Separate "detecting end of speech" from "auto-generating response." Model uses instructions to decide engagement level. Commit a2f6042
Why it matters
The model can listen to a multi-sentence request without interrupting after the first pause. Natural, human-like conversation.

Results Arrive While
the Model Is Speaking

You ask three things. The model starts answering the first. Meanwhile, agents finish the other two. What happens?

// Track response state const responseInProgress = useRef(false); const pendingAnnouncements = useRef([]); // Tool result arrives while speaking? if (responseInProgress.current) { // Queue it. Don't interrupt. pendingAnnouncements.current.push({ toolName, callId }); } else { // Not speaking? Report immediately. triggerResponse(); } // When model finishes speaking: case "response.done": responseInProgress.current = false; setTimeout(() => { if (pending.length > 0) { flushPendingAnnouncements(); } }, 100);
The Flush Message
"The explorer and architect tasks completed while you were speaking. Please report those results now briefly."
No Interruptions
Model finishes its current thought before reporting late-arriving results.
No Lost Results
Every tool result is queued and announced. Nothing falls through.
Natural Flow
Feels like a coworker saying "Oh, and I also found..." after finishing a thought.

Concurrent Tool Calls
That Don't Collide

"Explore the auth module AND check the test coverage" — two agents, running simultaneously, each with a unique tracking ID.

// Each tool call gets a unique ID from OpenAI const statusMessage = { sender: "system", text: `Delegating to ${getFriendlyToolName(toolCall.name)}...`, toolCallId: toolCall.id, // <-- unique per concurrent call toolStatus: "executing" }; // Update ONLY this specific call's status on completion setMessages(prev => prev.map(msg => msg.toolCallId === toolCall.id && msg.toolStatus === "executing" ? { ...msg, text: `Completed ${name}`, toolStatus: "completed" } : msg ))
Parallel Execution
Multiple agents run at the same time. No serialization bottleneck.
Isolated Tracking
toolCallId maps each result to its originating request. No cross-talk.
Live UI Updates
Each task shows status independently: delegating → executing → completed.

22 Event Types, Streamed
Live to Your Browser

Every Amplifier event — from LLM requests to agent forks — appears in real-time via Server-Sent Events.

🔼 provider:request
🔽 provider:response
🔧 tool:pre
🔧 tool:post
🔧 tool:error
🔀 session:fork
🔀 session:join
🧠 thinking:delta
🧠 thinking:final
content_block:start
content_block:delta
🔔 user:notification
context:compaction
✅ approval:request
Server-Side Hook
# Captures ALL Amplifier events EVENTS_TO_CAPTURE = [ "content_block:start", "content_block:delta", "thinking:delta", "tool:pre", "tool:post", "session:fork", "session:join", "provider:request", "llm:request:raw", ... # 22 event types total ]
Why This Matters
Voice AI is a black box. You say something, something happens, you get audio back. SSE streaming makes the invisible visible.
See Claude thinking. See agents spawning. See tool calls executing. See token costs accumulating. All in real-time, in a browser console with color-coded icons.

Connection Health Monitoring
and Smart Reconnection

WebRTC sessions have a 60-minute hard limit. Connections drop. Networks flake. The system handles all of it.

Health States
Healthy
Warning
Critical
Disconnected
Disconnect Reasons Tracked
idle_timeout
session_limit
connection_failed
data_channel_closed
stale_connection
network_error
Thresholds
idleWarning: 2 min sessionWarning: 55 min // 5 min before limit sessionLimit: 60 min // OpenAI hard cap staleThreshold: 30 sec
4 Reconnection Strategies
manual — User clicks to reconnect
auto_immediate — Instant retry
auto_delayed — Backoff then retry
proactive — Reconnect before expiry

Two Processes. Zero Wrappers.
Direct API Calls.

voice-server · Python
FastAPI + Uvicorn — HTTP/SSE server
httpx — Async OpenAI Realtime API calls
amplifier-foundation — Agent framework
sse-starlette — Event streaming
Amplifier bridge executes tools via direct Python calls. Zero subprocess overhead.
voice-client · TypeScript
React 18 + Vite — UI framework
Fluent UI + Copilot Components — Microsoft design system
Zustand — Lightweight state management
WebRTC — Real-time audio streaming
5 custom hooks: useWebRTC, useVoiceChat, useChatMessages, useAmplifierEvents, useConnectionHealth
Multi-Model Architecture
Voice layer: OpenAI gpt-realtime (GA) — speech-to-speech, real-time audio
Agent layer: Anthropic Claude Sonnet — deep reasoning, code generation, tool use
Each model does what it's best at. Voice handles conversation. Claude handles thinking.

By the Numbers

30
commits (Jan 30 – Feb 18)
4
days of development
1
developer
3
architecture pivots
100+
research docs
5
custom React hooks
22
event types streamed
Why this was possible: Amplifier's modular architecture meant each pivot was a configuration change, not a rewrite. The voice model didn't need to change — only its relationship to the agent framework did. Three architectures. Same voice client. Same agent roster.

Not a Demo. A Pattern.

Voice as Orchestration Layer
Most voice assistants are single-model, single-tool systems. This is a voice interface to a team of AI specialists. The voice model doesn't write code — it coordinates the agents that do.
Intentional Speech
Manual response control means the model is never forced to speak. It can listen to complex, multi-sentence requests. It can think before responding. It speaks when it has something to say.
Async-First Architecture
Tool results arrive whenever they're ready — before, during, or after the model speaks. The system handles all three cases gracefully. No blocking. No dropped results.
Multi-Model by Design
GPT Realtime for voice. Claude for reasoning. Each model does what it's best at. Not a compromise — an architecture.
Full Observability
22 event types streamed live. See every LLM call, every tool execution, every agent fork. Voice AI doesn't have to be a black box.
Rapid Architecture Evolution
Three fundamental pivots in four days. Amplifier's modularity made each change surgical, not seismic. The lesson: good infrastructure enables fearless iteration.

Research Methodology

Data as of: February 20, 2026

Feature status: Active

Research performed:

  • Git log analysis: git log --oneline amplifier-voice (30 commits found)
  • Contributor analysis: git log --format="%an"
  • Date range: extracted from git log timestamps

Gaps: Lines of code and file count not extracted; research doc count (100+) is an estimate from repo structure

Primary contributors: Brian Krabach (29 commits, ~97%), Sam Schillace (1 commit, ~3%)

Talk to your code.

Amplifier Voice proves that voice isn't just a UI layer — it's a fundamentally different way to interact with AI agent teams.

Try It
Clone amplifier-voice. Follow QUICKSTART.md. Talking to agents in under 5 minutes.
Extend It
Add your own agents. Home Assistant integration is already in progress.
Learn From It
100+ research docs in ai-context/. Architecture decisions documented in every commit.
Built by Brian Krabach · Powered by Amplifier · January – February 2026
1 / 16
More Amplifier Stories More Amplifier Stories