amplifier-bundle-ui-vision

Teaching AI
to See

When an AI agent built a feature but couldn't see
what it built, it created its own eyes.

February 12-13, 2026

Act I

The Vision

The Canvas team set out to build an Artifact Viewer - a full-featured file browser for AI-generated code. The spec was meticulous.

1,021

Lines of Spec

25

Screenshots Studied

17

HTML Mockups

21

Spec Sections

CANVAS-ARTIFACT-VIEWER-SPEC.md

25 Kepler desktop screenshots analyzed. 17 interactive HTML mockups prototyped. 21 sections covering every detail from file tree styling to VS Code-inspired badges. A 10-Act acceptance test scripting the exact verification sequence.

Act II

The Autonomous Build

A 528-line implementation prompt was handed to an Amplifier session. What followed was 4 hours of fully autonomous development.

Component	Lines
ArtifactViewer.tsx	617
useArtifactStore.ts	462
CodeView.tsx	272
PreviewView.tsx	255
FileList.tsx	227
+ 5 more files	273

2,106

Total Lines Written

10

React Components

~4h

Autonomous Dev Time

The Output

A Complete Feature,
Built Without Human Touch

ArtifactViewer.tsx

617-line main popup shell with full-screen mode, keyboard shortcuts, and responsive layout

useArtifactStore.ts

462-line Zustand state management with file registry, version history, and selection logic

CodeView.tsx

PrismJS syntax highlighting with line numbers and 15+ language support

PreviewView.tsx

Multi-strategy HTML/image preview with iframe sandboxing and asset inlining

FileList.tsx

VS Code-style tree navigator with colored file-type badges and expand/collapse

SSE Detection

Real-time artifact detection from streaming tool_call events via Server-Sent Events

Act III

The Crash

The AI built the feature flawlessly.
Then it tried to look at what it built.

The Failure

35+ Nested Recursion Loop

The session launched a browser-operator to visually verify the UI. The operator couldn't process what it saw. It spawned another. Then another. Then another.

// What the session tried to do: delegate browser-operator "Check if the UI looks right" -> delegate browser-operator "Navigate and verify" -> delegate browser-operator "Take screenshot" -> delegate browser-operator "Analyze screenshot" -> delegate browser-operator "..." // 35+ levels deep // Session crashed // Context window exhausted

One simple question killed the session: "Does this look right?"

The Evidence

SESSION-HANDOFF.md

The session died before updating its handoff document. Every feature: PENDING. Every acceptance test: NOT TESTED. Every checkpoint: NOT REACHED.

Feature Progress

Zustand Artifact RegistryPENDING

SSE Artifact DetectionPENDING

ArtifactViewer ShellPENDING

File List UIPENDING

Code TabPENDING

Preview TabPENDING

Version HistoryPENDING

...11 more featuresPENDING

10-Act Acceptance Test

Act 1: Project SetupNOT TESTED

Act 2: File CreationNOT TESTED

Act 3: Preview TestingNOT TESTED

Act 4: IterationNOT TESTED

Act 5: Adding ImagesNOT TESTED

Act 6: Cross-ProjectNOT TESTED

Act 7: SnapshotNOT TESTED

Act 8-9: Error & ResumeNOT TESTED

The Insight

AI agents need
eyes, not just code.

Building UI without visual verification
is building blind.

Act IV

Pain Drives Innovation

ui-vision

A purpose-built Amplifier bundle born from a crash.
Built the next morning. Merged by lunch.

The Solution

Three Agents, Three Modes

Each agent answers a different question about your UI.

👁

Visual Auditor

"Does this look good?"

Quality

🔄

Regression Checker

"Did my changes break anything?"

Safety

♿

Accessibility Scanner

"Can everyone use this?"

Inclusion

Plus 3 matching recipes for repeatable, automated testing workflows.

How It Works

Playwright MCP + Vision Mode

The core innovation: screenshots returned as base64 images the model can literally see and analyze.

Navigate
browser_navigate

→

Screenshot
base64 image

→

Snapshot
a11y tree

→

Analyze
structured report

# The magic flags
command: npx
args:
  - "@playwright/mcp@latest"
  - "--caps=vision"           # Enable vision mode
  - "--headless"               # No browser window
  - "--image-responses"
  - "allow"                    # Return base64 images

Amplifier Composability

Drop-In Vision for Any Bundle

The entire capability is a composable behavior YAML. Add visual testing to any existing Amplifier bundle in two lines.

# As a full bundle
includes:
  - bundle: git+...amplifier-bundle-ui-vision@main

# As a behavior (compose into existing)
includes:
  - bundle: git+...ui-vision@main
         #subdirectory=behaviors/ui-vision.yaml

What You Get

● Playwright MCP with vision mode

● 3 specialist agents auto-registered

● 3 recipes ready to execute

● Context docs for delegation routing

Bundle Size

1 behavior YAML (31 lines)

3 agent definitions

3 recipe definitions

2 context documents

Act V

First Sight

February 13, 11:00 AM - The AI opens its eyes.

            // The very first visual verification

            browser_navigate("http://localhost:5173")

            browser_take_screenshot()

            // -> Returns base64 PNG

            // -> Model SEES the rendered pixels

            // -> Saved: vision-test-screenshot.png

            // "I can see the dashboard layout with a sidebar

            //  on the left, a header at the top, and the main

            //  content area showing the artifact viewer..."

30

Screenshots Captured

11:25

PR #31 Merged

The Transformation

Before & After

✕ Before ui-vision

AI builds UI code blindly, hoping it works

Visual verification requires a human in the loop

Generic browser-operator with no visual context

35+ recursion crash trying to self-verify

SESSION-HANDOFF: every item PENDING

No accessibility awareness during development

✓ After ui-vision

AI sees base64 screenshots of rendered UI

Autonomous visual verification with structured reports

Purpose-built specialist agents for each concern

Clean delegation, focused analysis, no recursion

Visual evidence attached to every verification

WCAG 2.1 AA scanning built into the workflow

The Pattern

2-Step Recipe Architecture

Every recipe follows the same pattern: specialist analysis, then executive summary.

# visual-audit.yaml
name: visual-audit
steps:
  - id: audit
    agent: ui-vision:visual-auditor
    instruction: |
      Navigate to {{ url }}
      Screenshot each page in {{ pages }}
      Analyze: layout, typography, color, spacing
      Rate issues: Critical / Major / Minor / Nitpick

  - id: summary
    instruction: |
      From {{ steps.audit.result }}:
      1. Overall quality score (1-10)
      2. Top 3 issues to fix
      3. Top 3 things done well
      4. Next steps by effort vs impact

Velocity

From Crash to Capability

Feb 12, Morning

The Vision

1,021-line UX spec completed with 17 mockups

Feb 12, Afternoon (~4 hours)

The Build

2,106 lines of React autonomously written across 10 components

Feb 12, Evening

The Crash

35+ recursion loop. Session dead. All features marked PENDING.

Feb 13, Morning

The Build (ui-vision)

3 agents, 3 recipes, 1 behavior YAML. Purpose-built from pain.

Feb 13, 11:00 AM

First Sight

30 screenshots captured. AI verifies its own UI. vision-test-screenshot.png saved.

Feb 13, 11:25 AM

PR #31 Merged

The AI can see. The loop is closed.

Lessons

What This Teaches Us

AI builds its own tools

When an AI agent hits a wall, the response isn't a workaround - it's a new capability. The crash wasn't a failure. It was a requirements document.

Composability is leverage

A 31-line behavior YAML gives any Amplifier bundle the power of visual testing. Agents, recipes, and MCP tools compose like LEGO bricks.

Pain drives purpose

The most useful tools emerge from real failures. A 35-recursion crash at 10 PM led to an elegant solution by 11 AM the next day.

Specialist > generalist

A generic browser-operator crashed. Three focused specialists - auditor, regression checker, accessibility scanner - each do one thing brilliantly.

Try It

The AI Can See Now.

Add visual testing to your Amplifier bundle today.

# In your session:
"Check how localhost:5173 looks"
"Did my CSS changes break anything?"
"Run an accessibility scan on the signup form"

amplifier-bundle-ui-vision · Built Feb 13, 2026 · PR #31

Teaching AIto See

The Vision

The Autonomous Build

A Complete Feature,Built Without Human Touch

The Crash

35+ Nested Recursion Loop

SESSION-HANDOFF.md

AI agents needeyes, not just code.

Pain Drives Innovation

Three Agents, Three Modes

Playwright MCP + Vision Mode

Drop-In Vision for Any Bundle

First Sight

Before & After

✕ Before ui-vision

✓ After ui-vision

2-Step Recipe Architecture

From Crash to Capability

What This Teaches Us

The AI Can See Now.

Teaching AI
to See

A Complete Feature,
Built Without Human Touch

AI agents need
eyes, not just code.