Amplifier Bundle

AI That Can See
Your Terminal

amplifier-bundle-terminal-tester
Spawn, inspect, and interact with TUI applications — from inside an AI agent.

Active New March 2026

The Problem

835 tests passing.
UI visibly broken.

"Let's pause and find a way for you to be able to inspect the interactive side of the TUI, since there are very obvious issues with it once you load it up, but since you couldn't see it you have no idea and I cannot be the one to test and report it all back to you."

— Turn 89 of an 89-turn Amplifier TUI session

🔇

Blind AI

The AI was building a Rust/Ratatui terminal interface, but had no way to see the rendered output. Unit tests passed. Integration tests passed. The actual screen? Broken.

🧑‍💻

Human bottleneck

Every visual bug required a human to launch the app, spot the issue, describe it in text, and relay it back. The developer became the sole QA tester for an AI-built application.

The Solution

One tool. Two capture modes.
Any terminal app.

📸

Screen-Dump Mode

Reads Ratatui's own render buffer via --screen-dump-path. Pixel-perfect, frame-numbered, zero ANSI parsing.

Ratatui crossterm

🖥️

PTY Mode

Universal VT100 terminal emulation via pyte. Works with any terminal app — Textual, Bubble Tea, urwid, plain CLI tools.

Universal pyte

🧠

Auto-Detection

If the command includes --screen-dump-path, it uses dump mode. Otherwise, PTY. Zero config for the agent.

Zero-config

# 1 tool, 9 operations
terminal_inspector  spawn | screenshot | send_keys | send_text
                     find_text | wait_for_text | resize | close | list

Architecture

Three specialist agents

🤖

terminal-operator

Launches apps, drives interaction, verifies results. Follows a 6-step workflow with a 3-attempt failure budget before escalating.

Launch Interact Verify

📐

terminal-visual-tester

Responsive layout testing across 6 breakpoints (60→200 columns). Runs visual quality checklists against every screen state.

6 breakpoints Layout Responsive

🔬

terminal-debugger

Frame-by-frame analysis, keystroke-response verification, root-cause identification with file:line citations and confidence ratings.

Root cause Frame analysis

Repo: microsoft/amplifier-bundle-terminal-tester

By The Numbers

Real usage across two projects over 4 days

~1,700

terminal_inspector operations

across 37 sessions

37+

bugs discovered

invisible to unit tests

8+

autonomous fix-verify cycles

end-to-end, no human

254

send_keys

252

screenshots

33

wait_for_text gates

Used as the QA gate in an autonomous dev-machine: 53 iterations + 23 QA rounds

Impact

Bugs only terminal_inspector could find

#01

Response text doubling

AI responses rendered twice. Confirmed across 4 prompts + vision model analysis of screenshots.

#02

Ctrl+C inserting literal ^C

Instead of quitting the app. Regressed 3 times during development.

#03

ANSI escape sequence leak

"OQ" characters appearing from rapid ESC+F2 keystrokes.

#04

Stray border artifact

Visible at specific screen coordinates after the sidebar closes.

#05

Settings panel inaccessible

5 of 6 settings sections unreachable via keyboard navigation.

#06

Token counter frozen at zero

Despite real API usage, the token display never updated.

#07

Status bar stuck on "working"

Status indicator never reset after response completion.

#08

"Brutally honest" product audit

Session resume was cosmetic-only, settings disconnected from runtime.

Autonomous QA

The AI verifies its own visual output

Build

→

Compile

→

Test visually

→

File bugs

→

Dev-machine

→

Rebuild

→

Retest

🔄

53 + 23

53 dev-machine iterations and 23 terminal-tester QA rounds. No human in the loop for visual verification. The terminal-tester became the quality gate in an autonomous build-test-fix cycle.

👁️

AI verifying AI

Found the response-doubling bug, confirmed it across 4 prompts, then called a vision model to analyze its own screenshots to prove it wasn't a PTY artifact.

Deep Debugging

From visual symptom
to exact line of Rust

🔬

terminal-debugger in action

The debugger agent combined live terminal inspection with Rust source code analysis:

▸ Traced a markdown rendering bug from its visual symptom to two specific lines of Rust code

▸ Used frame counter analysis to detect whether keys were received, handled, or just not re-rendering

▸ Produced structured root-cause reports with file:line citations, suggested fixes, and confidence ratings

// Debugger output structure
{
  "symptom": "Markdown not rendering",
  "root_cause": {
    "file": "src/ui/chat.rs",
    "lines": [142, 147],
    "issue": "Raw text passed instead
             of parsed spans"
  },
  "confidence": 0.92,
  "fix": "Replace raw & with
        markdown::parse()"
}

Visual bug → Frame analysis → Source trace → Fix with confidence rating

Credits

Standing on shoulders

🧪

PTY + pyte approach

Diego Colombo (@colombod) pioneered the PTY/pyte approach for AI terminal testing.

colombod/amplifier-bundle-tui-tester

📸

Screen-dump approach

Developed during Amplifier TUI Phase 6. Reads Ratatui's native render buffer for pixel-perfect captures.

🏗️

Bundle structure

Modeled after the browser-tester bundle pattern for consistent Amplifier ecosystem integration.

microsoft/amplifier-bundle-browser-tester

Diego Colombo is credited as co-author in the git history.

Sources

Data & Methodology

Data as of: March 2026

Feature status: Active — microsoft/amplifier-bundle-terminal-tester

Data sources:

Usage metrics (~1,700 operations, 37 sessions, 254 send_keys, 252 screenshots, 33 wait_for_text) from session logs across two projects over 4 days
Bug count (37+) from tracked issues found exclusively via terminal_inspector during Amplifier TUI development
Autonomous cycle counts (53 dev-machine iterations, 23 QA rounds, 8+ fix-verify cycles) from session records
The quoted conversation is from a real 89-turn Amplifier TUI development session

Qualifiers preserved:

"~1,700" (approximate — aggregated across multiple session logs)
"37+" and "8+" (minimum confirmed counts — actual numbers may be higher)

Prior art: Diego Colombo's amplifier-bundle-tui-tester (PTY/pyte approach)

Gaps: Exact per-session operation breakdowns not individually verified. Bug severity classification is qualitative.

Get Started

Try it now

Install

amplifier bundle add \ git+https://github.com/microsoft/amplifier-bundle-terminal-tester@main\ #subdirectory=behaviors/terminal-tester.yaml \ --app

tmux — required for screen-dump mode
pip install pyte Pillow — required for PTY mode

github.com/microsoft/amplifier-bundle-terminal-tester

Spawn. Screenshot. Ship.

AI That Can SeeYour Terminal

835 tests passing.UI visibly broken.

One tool. Two capture modes.Any terminal app.