Amplifier Bundle

AI That Can See
Your Terminal

amplifier-bundle-terminal-tester
Spawn, inspect, and interact with TUI applications — from inside an AI agent.

Active New March 2026
The Problem

835 tests passing.
UI visibly broken.

"Let's pause and find a way for you to be able to inspect the interactive side of the TUI, since there are very obvious issues with it once you load it up, but since you couldn't see it you have no idea and I cannot be the one to test and report it all back to you."
— Turn 89 of an 89-turn Amplifier TUI session
🔇
Blind AI
The AI was building a Rust/Ratatui terminal interface, but had no way to see the rendered output. Unit tests passed. Integration tests passed. The actual screen? Broken.
🧑‍💻
Human bottleneck
Every visual bug required a human to launch the app, spot the issue, describe it in text, and relay it back. The developer became the sole QA tester for an AI-built application.
The Solution

One tool. Two capture modes.
Any terminal app.

📸
Screen-Dump Mode
Reads Ratatui's own render buffer via --screen-dump-path. Pixel-perfect, frame-numbered, zero ANSI parsing.
Ratatui crossterm
🖥️
PTY Mode
Universal VT100 terminal emulation via pyte. Works with any terminal app — Textual, Bubble Tea, urwid, plain CLI tools.
Universal pyte
🧠
Auto-Detection
If the command includes --screen-dump-path, it uses dump mode. Otherwise, PTY. Zero config for the agent.
Zero-config
# 1 tool, 9 operations terminal_inspector spawn | screenshot | send_keys | send_text find_text | wait_for_text | resize | close | list
Architecture

Three specialist agents

🤖
terminal-operator
Launches apps, drives interaction, verifies results. Follows a 6-step workflow with a 3-attempt failure budget before escalating.
Launch Interact Verify
📐
terminal-visual-tester
Responsive layout testing across 6 breakpoints (60→200 columns). Runs visual quality checklists against every screen state.
6 breakpoints Layout Responsive
🔬
terminal-debugger
Frame-by-frame analysis, keystroke-response verification, root-cause identification with file:line citations and confidence ratings.
Root cause Frame analysis

Repo: microsoft/amplifier-bundle-terminal-tester

By The Numbers

Real usage across two projects over 4 days

~1,700
terminal_inspector operations
across 37 sessions
37+
bugs discovered
invisible to unit tests
8+
autonomous fix-verify cycles
end-to-end, no human
254
send_keys
252
screenshots
33
wait_for_text gates

Used as the QA gate in an autonomous dev-machine: 53 iterations + 23 QA rounds

Impact

Bugs only terminal_inspector could find

#01
Response text doubling
AI responses rendered twice. Confirmed across 4 prompts + vision model analysis of screenshots.
#02
Ctrl+C inserting literal ^C
Instead of quitting the app. Regressed 3 times during development.
#03
ANSI escape sequence leak
"OQ" characters appearing from rapid ESC+F2 keystrokes.
#04
Stray border artifact
Visible at specific screen coordinates after the sidebar closes.
#05
Settings panel inaccessible
5 of 6 settings sections unreachable via keyboard navigation.
#06
Token counter frozen at zero
Despite real API usage, the token display never updated.
#07
Status bar stuck on "working"
Status indicator never reset after response completion.
#08
"Brutally honest" product audit
Session resume was cosmetic-only, settings disconnected from runtime.
Autonomous QA

The AI verifies its own visual output

Build
Compile
Test visually
File bugs
Dev-machine
Rebuild
Retest
🔄
53 + 23
53 dev-machine iterations and 23 terminal-tester QA rounds. No human in the loop for visual verification. The terminal-tester became the quality gate in an autonomous build-test-fix cycle.
👁️
AI verifying AI
Found the response-doubling bug, confirmed it across 4 prompts, then called a vision model to analyze its own screenshots to prove it wasn't a PTY artifact.
Deep Debugging

From visual symptom
to exact line of Rust

🔬
terminal-debugger in action
The debugger agent combined live terminal inspection with Rust source code analysis:

▸ Traced a markdown rendering bug from its visual symptom to two specific lines of Rust code

▸ Used frame counter analysis to detect whether keys were received, handled, or just not re-rendering

▸ Produced structured root-cause reports with file:line citations, suggested fixes, and confidence ratings
// Debugger output structure { "symptom": "Markdown not rendering", "root_cause": { "file": "src/ui/chat.rs", "lines": [142, 147], "issue": "Raw text passed instead of parsed spans" }, "confidence": 0.92, "fix": "Replace raw & with markdown::parse()" }
Visual bug → Frame analysis → Source trace → Fix with confidence rating
Credits

Standing on shoulders

🧪
PTY + pyte approach
Diego Colombo (@colombod) pioneered the PTY/pyte approach for AI terminal testing.
colombod/amplifier-bundle-tui-tester
📸
Screen-dump approach
Developed during Amplifier TUI Phase 6. Reads Ratatui's native render buffer for pixel-perfect captures.
🏗️
Bundle structure
Modeled after the browser-tester bundle pattern for consistent Amplifier ecosystem integration.
microsoft/amplifier-bundle-browser-tester

Diego Colombo is credited as co-author in the git history.

Sources

Data & Methodology

Data as of: March 2026

Feature status: Active — microsoft/amplifier-bundle-terminal-tester


Data sources:


Qualifiers preserved:


Prior art: Diego Colombo's amplifier-bundle-tui-tester (PTY/pyte approach)

Gaps: Exact per-session operation breakdowns not individually verified. Bug severity classification is qualitative.

Get Started

Try it now

Install
amplifier bundle add \ git+https://github.com/microsoft/amplifier-bundle-terminal-tester@main\ #subdirectory=behaviors/terminal-tester.yaml \ --app
github.com/microsoft/amplifier-bundle-terminal-tester

Spawn. Screenshot. Ship.

More Amplifier Stories