Case Study

The Marathon Session

62 Hours. 9 Prompts. 147 Commits.

How an Amplifier session built an entire TUI application autonomously

Active
Session 653a396c • amplifier-tui • February 2026
Session architect: Sam Schillace
The Setup

Symphony night

A developer built testing tools, wrote a feature roadmap, pointed the agent at its memory file—then left for the evening.

The Project
amplifier-tui — a Textual-based terminal UI for Amplifier. A full interactive TUI for managing sessions, running commands, and browsing history.
The Prep Work
SVG-based visual verification tools built first. A feature list. A SESSION-HANDOFF.md with architecture decisions and conventions.
The Launch
“Work through as many features as you can. Commit after each one. Be ambitious. I have to leave for a while now.”
By the Numbers

What one session produced

Wall-clock duration62 hours
Active work time~22 hours
Human prompts9
Sub-sessions spawned440
Context compactions631
Git commits147
Feature commits138
Session errors logged*0
Events logged10,600
*Zero errors logged by the session orchestrator — does not imply zero bugs in output code
Compaction Survival

631 compactions. Zero lost context.

The session’s memory was wiped 631 times—and it kept building.

103K
Post-Compaction Tokens
Stayed stable at ~103–105K throughout the entire session. Consistent working memory baseline.
1.2M
Peak Pre-Compaction
Context grew from 111K to 1.2M tokens before each compaction. By the end, 92% was discarded each cycle.
The key insight: the orchestrator’s own state was tiny. It only tracked “what feature am I on?”—everything else lived in disposable sub-sessions. Small state survives aggressive compaction.
Architecture

Orchestrator + Disposable Workers

The main session never wrote a single line of code.

Orchestrator
Main Session
145 ×
self-delegate
159 ×
git-ops
Pure Orchestration
Main session only decided what to build next. Never touched code directly.
Disposable Workers
Each feature built in a fresh sub-session. Full context window for the task. Discarded after commit.
Tiny Footprint
Orchestration state = feature list + current position. Survives any amount of compaction.
The Launch Prompt

Four lines that defined the architecture

“Work through as many features as you can, one at a time with subagents so you don’t get exhausted.”

“Commit and push after each one.”

“Be ambitious.”

“I have to leave for a while now.”
— The human’s 3rd prompt, starting the autonomous run
Why this worked: “one at a time with subagents” taught the compaction-survival architecture. “Commit after each one” created natural checkpoints. “Be ambitious” gave permission to keep going. “I have to leave” removed the expectation of feedback loops.
Domain Fitness

TUI features are naturally modular

Best Domains
Self-contained features. Each one is add-a-keybinding, implement-a-handler. New features don’t modify existing ones. 1–3 files per feature.

CLI features • UI components • API endpoints • Test suites
Worst Domains
Cross-cutting changes. Tasks where you need to understand the whole system. High dependency between pieces.

Refactoring • Architecture changes • Complex debugging
The pattern: /search doesn’t need to understand /theme. Each TUI feature has clear boundaries—a keybinding, a handler, maybe a new widget. Sub-agents get full context in a single session.
External Memory

SESSION-HANDOFF.md as lifeline

When context compacts, files on disk survive.

# SESSION-HANDOFF.md ## Architecture Decisions - Textual framework, not curses - Dark theme with accent colors - All commands via command palette ## Conventions - Feature per file in features/ - Tests mirror source structure - Commit messages: "feat: description" ## Feature Roadmap - [ ] /search command - [ ] /theme switcher - [ ] Session history browser - [ ] ...
4
Times read during session
631
Compactions survived
1st
Prompt: “Read SESSION-HANDOFF.md”
Verification Pipeline

Build testing tools before the marathon

SVG-based visual verification gave sub-agents a way to self-check without a human.

tui_capture.py
Headless screenshot via Textual Pilot API. Captures the TUI as SVG without a terminal.
svg_parser.py
Extracts text content, colors, and positions from SVG. Structured data, not pixels.
ux_test.sh
One-command validation. Run the TUI, capture, parse, assert. Full pipeline in seconds.
SVG parsing beats OCR for TUI work. The Textual framework renders to SVG natively. Parsing SVG for text and layout is deterministic—no model inference, no flakiness. Each sub-agent could verify its own work.
Honest Assessment

Feature reimplementation was real

~50–60
Truly Unique Features
Out of 138 feature commits. Later batches (15+) repeated features from batches 1–5. No deduplication mechanism existed.
Features 1–30: all unique 60–138: heavy overlap
unique
mixed
overlap
The fix: maintain a COMPLETED.md file or check git log before starting each feature. A simple deduplication gate.
Patterns

Five patterns for long-running agents

Prompting Playbook

What the human did right

External Memory Pointers
“Read SESSION-HANDOFF.md” — point the agent at its own memory instead of repeating context.
Minimal Interventions
Most prompts were “yes, go ahead” or “keep going.” Only 9 prompts total across 62 hours.
Delegation Instruction
“One at a time with subagents” — a single sentence that defined the compaction-safe architecture.
Open-Ended Permission
“Be ambitious.” “Add as many as you can.” Giving scope and trust rather than prescriptive lists.
Specific When Needed
When intervening, gave exact text references and concrete direction—never vague descriptions.
Architect, Not Worker
The user’s role was to design the agent’s work architecture, not to do the work itself.
The Sequel

Two weeks later, the same pattern—refined

The developer had learned from Session 1. This time: plan first, then launch.

Session 1: “Be ambitious, go.”
Open-ended feature list. No specs. No test IDs. The agent decided what to build as it went. Brilliant improvisation—but duplication crept in after batch 5.
Session 2: “Here’s exactly what to build.”
Full health audit first: all 356 tests passing. 16 features brainstormed across 5 waves, specced with acceptance criteria, pre-assigned test IDs. An orchestration context file. The deduplication problem was solved before the session started.
The preparation: FEATURE-BACKLOG.md with full specs and pre-assigned test IDs. build-session.md as the orchestration context. All 356 existing tests verified (263 unit + 93 interactive) before launch. Committed at 5745df1—then one prompt to go.
Session 2 Results

14 features. 659 new tests. One prompt.

“Run the build session. Work through the feature backlog.”

Session Tags44 tests
Clipboard Ring52 tests
Inline Diff45 tests
Agent Tree View56 tests
Context Profiler47 tests
Tool Introspection50 tests
Recipe Pipeline Viz49 tests
Smart /include48 tests
Conversation Branching57 tests
Model A/B Testing52 tests
Session Replay91 tests
Plugin System53 tests
Heatmap Dashboard65 tests
263
Tests before
922
Tests after
3.5×
Growth
15
Commits pushed
The Pattern

Each session makes the next one better

Session 1 Taught
Orchestrator + disposable workers. Commit after each feature. External memory files. Open-ended permission works—but needs a deduplication gate.
Session 2 Applied
Structured feature backlog with specs. Pre-assigned test IDs for every feature. Orchestration context file. The backlog is the deduplication gate.
The Result
14 features in one evening, zero duplication. Session 1’s missing pattern—a deduplication gate—was solved by the feature backlog itself.
Session 1
~50–60 unique features over 62 hours. 9 prompts. Creative but some duplication. Pioneered the architecture.
Session 2
14 features in one evening. 1 prompt. Zero duplication. 659 new tests. Refined the architecture.
The compounding insight: Session 1 was exploration—discover what works. Session 2 was exploitation—apply what you learned. The human’s role evolves from architect to systems designer.
Sources

Research Methodology

Data as of: February 20, 2026

Feature status: Active (amplifier-tui)

Research performed:

Gaps & estimates:

Session architect: Sam Schillace

Try It Yourself

Design the work.
Let the agent build.

Two sessions. One pattern refined. The human’s role is to design the agent’s work architecture, not to do the work.

2
Sessions
162
Commits
922
Tests
0
Session Errors*
*Orchestrator errors logged — does not imply zero bugs in output code

The TUI: github.com/ramparte/amplifier-tui
Amplifier: github.com/microsoft/amplifier

Orchestrate. Delegate. Ship.
More Amplifier Stories