When an AI agent built a feature but couldn't see what it built, it created its own eyes.
February 12-13, 2026
Act I
The Vision
The Canvas team set out to build an Artifact Viewer - a full-featured file browser for AI-generated code. The spec was meticulous.
1,021
Lines of Spec
25
Screenshots Studied
17
HTML Mockups
21
Spec Sections
CANVAS-ARTIFACT-VIEWER-SPEC.md
25 Kepler desktop screenshots analyzed. 17 interactive HTML mockups prototyped.
21 sections covering every detail from file tree styling to VS Code-inspired badges.
A 10-Act acceptance test scripting the exact verification sequence.
Act II
The Autonomous Build
A 528-line implementation prompt was handed to an Amplifier session. What followed was 4 hours of fully autonomous development.
Component
Lines
ArtifactViewer.tsx
617
useArtifactStore.ts
462
CodeView.tsx
272
PreviewView.tsx
255
FileList.tsx
227
+ 5 more files
273
2,106
Total Lines Written
10
React Components
~4h
Autonomous Dev Time
The Output
A Complete Feature, Built Without Human Touch
ArtifactViewer.tsx
617-line main popup shell with full-screen mode, keyboard shortcuts, and responsive layout
useArtifactStore.ts
462-line Zustand state management with file registry, version history, and selection logic
CodeView.tsx
PrismJS syntax highlighting with line numbers and 15+ language support
PreviewView.tsx
Multi-strategy HTML/image preview with iframe sandboxing and asset inlining
FileList.tsx
VS Code-style tree navigator with colored file-type badges and expand/collapse
SSE Detection
Real-time artifact detection from streaming tool_call events via Server-Sent Events
Act III
The Crash
The AI built the feature flawlessly. Then it tried to look at what it built.
The Failure
35+ Nested Recursion Loop
The session launched a browser-operator to visually verify the UI. The operator couldn't process what it saw. It spawned another. Then another. Then another.
// What the session tried to do:delegatebrowser-operator"Check if the UI looks right"->delegatebrowser-operator"Navigate and verify"->delegatebrowser-operator"Take screenshot"->delegatebrowser-operator"Analyze screenshot"->delegatebrowser-operator"..."// 35+ levels deep// Session crashed// Context window exhausted
One simple question killed the session: "Does this look right?"
The Evidence
SESSION-HANDOFF.md
The session died before updating its handoff document. Every feature: PENDING. Every acceptance test: NOT TESTED. Every checkpoint: NOT REACHED.
Feature Progress
Zustand Artifact RegistryPENDING
SSE Artifact DetectionPENDING
ArtifactViewer ShellPENDING
File List UIPENDING
Code TabPENDING
Preview TabPENDING
Version HistoryPENDING
...11 more featuresPENDING
10-Act Acceptance Test
Act 1: Project SetupNOT TESTED
Act 2: File CreationNOT TESTED
Act 3: Preview TestingNOT TESTED
Act 4: IterationNOT TESTED
Act 5: Adding ImagesNOT TESTED
Act 6: Cross-ProjectNOT TESTED
Act 7: SnapshotNOT TESTED
Act 8-9: Error & ResumeNOT TESTED
The Insight
AI agents need eyes, not just code.
Building UI without visual verification is building blind.
Act IV
Pain Drives Innovation
ui-vision
A purpose-built Amplifier bundle born from a crash. Built the next morning. Merged by lunch.
The Solution
Three Agents, Three Modes
Each agent answers a different question about your UI.
👁
Visual Auditor
"Does this look good?"
Quality
🔄
Regression Checker
"Did my changes break anything?"
Safety
♿
Accessibility Scanner
"Can everyone use this?"
Inclusion
Plus 3 matching recipes for repeatable, automated testing workflows.
How It Works
Playwright MCP + Vision Mode
The core innovation: screenshots returned as base64 images the model can literally see and analyze.
The entire capability is a composable behavior YAML. Add visual testing to any existing Amplifier bundle in two lines.
# As a full bundleincludes:
- bundle:git+...amplifier-bundle-ui-vision@main# As a behavior (compose into existing)includes:
- bundle:git+...ui-vision@main#subdirectory=behaviors/ui-vision.yaml
What You Get
● Playwright MCP with vision mode
● 3 specialist agents auto-registered
● 3 recipes ready to execute
● Context docs for delegation routing
Bundle Size
1 behavior YAML (31 lines)
3 agent definitions
3 recipe definitions
2 context documents
Act V
First Sight
February 13, 11:00 AM - The AI opens its eyes.
// The very first visual verification browser_navigate("http://localhost:5173") browser_take_screenshot() // -> Returns base64 PNG // -> Model SEES the rendered pixels // -> Saved: vision-test-screenshot.png
// "I can see the dashboard layout with a sidebar // on the left, a header at the top, and the main // content area showing the artifact viewer..."
30
Screenshots Captured
11:25
PR #31 Merged
The Transformation
Before & After
✕ Before ui-vision
AI builds UI code blindly, hoping it works
Visual verification requires a human in the loop
Generic browser-operator with no visual context
35+ recursion crash trying to self-verify
SESSION-HANDOFF: every item PENDING
No accessibility awareness during development
✓ After ui-vision
AI sees base64 screenshots of rendered UI
Autonomous visual verification with structured reports
Purpose-built specialist agents for each concern
Clean delegation, focused analysis, no recursion
Visual evidence attached to every verification
WCAG 2.1 AA scanning built into the workflow
The Pattern
2-Step Recipe Architecture
Every recipe follows the same pattern: specialist analysis, then executive summary.
# visual-audit.yamlname: visual-audit
steps:
- id: audit
agent:ui-vision:visual-auditorinstruction:|Navigate to {{ url }}Screenshot each page in {{ pages }}Analyze: layout, typography, color, spacingRate issues: Critical / Major / Minor / Nitpick
- id: summary
instruction:|From {{ steps.audit.result }}:1. Overall quality score (1-10)2. Top 3 issues to fix3. Top 3 things done well4. Next steps by effort vs impact
Velocity
From Crash to Capability
Feb 12, Morning
The Vision
1,021-line UX spec completed with 17 mockups
Feb 12, Afternoon (~4 hours)
The Build
2,106 lines of React autonomously written across 10 components
Feb 12, Evening
The Crash
35+ recursion loop. Session dead. All features marked PENDING.
Feb 13, Morning
The Build (ui-vision)
3 agents, 3 recipes, 1 behavior YAML. Purpose-built from pain.
Feb 13, 11:00 AM
First Sight
30 screenshots captured. AI verifies its own UI. vision-test-screenshot.png saved.
Feb 13, 11:25 AM
PR #31 Merged
The AI can see. The loop is closed.
Lessons
What This Teaches Us
AI builds its own tools
When an AI agent hits a wall, the response isn't a workaround - it's a new capability. The crash wasn't a failure. It was a requirements document.
Composability is leverage
A 31-line behavior YAML gives any Amplifier bundle the power of visual testing. Agents, recipes, and MCP tools compose like LEGO bricks.
Pain drives purpose
The most useful tools emerge from real failures. A 35-recursion crash at 10 PM led to an elegant solution by 11 AM the next day.
Specialist > generalist
A generic browser-operator crashed. Three focused specialists - auditor, regression checker, accessibility scanner - each do one thing brilliantly.
Try It
The AI Can See Now.
Add visual testing to your Amplifier bundle today.
# In your session:"Check how localhost:5173 looks""Did my CSS changes break anything?""Run an accessibility scan on the signup form"
amplifier-bundle-ui-vision · Built Feb 13, 2026 · PR #31