Case Study
The Marathon Session
62 Hours. 9 Prompts. 147 Commits.
How an Amplifier session built an entire TUI application autonomously
Active
Session 653a396c • amplifier-tui • February 2026
Session architect: Sam Schillace
The Setup
Symphony night
A developer built testing tools, wrote a feature roadmap, pointed the agent at its memory file—then left for the evening.
The Project
amplifier-tui — a Textual-based terminal UI for Amplifier. A full interactive TUI for managing sessions, running commands, and browsing history.
The Prep Work
SVG-based visual verification tools built first. A feature list. A SESSION-HANDOFF.md with architecture decisions and conventions.
The Launch
“Work through as many features as you can. Commit after each one. Be ambitious. I have to leave for a while now.”
By the Numbers
What one session produced
| Wall-clock duration | 62 hours |
| Active work time | ~22 hours |
| Human prompts | 9 |
| Sub-sessions spawned | 440 |
| Context compactions | 631 |
| Git commits | 147 |
| Feature commits | 138 |
| Session errors logged* | 0 |
| Events logged | 10,600 |
*Zero errors logged by the session orchestrator — does not imply zero bugs in output code
Compaction Survival
631 compactions. Zero lost context.
The session’s memory was wiped 631 times—and it kept building.
103K
Post-Compaction Tokens
Stayed stable at ~103–105K throughout the entire session. Consistent working memory baseline.
1.2M
Peak Pre-Compaction
Context grew from 111K to 1.2M tokens before each compaction. By the end, 92% was discarded each cycle.
The key insight: the orchestrator’s own state was tiny. It only tracked “what feature am I on?”—everything else lived in disposable sub-sessions. Small state survives aggressive compaction.
Architecture
Orchestrator + Disposable Workers
The main session never wrote a single line of code.
Orchestrator
Main Session
→
→
Pure Orchestration
Main session only decided what to build next. Never touched code directly.
Disposable Workers
Each feature built in a fresh sub-session. Full context window for the task. Discarded after commit.
Tiny Footprint
Orchestration state = feature list + current position. Survives any amount of compaction.
The Launch Prompt
Four lines that defined the architecture
“Work through as many features as you can, one at a time with subagents so you don’t get exhausted.”
“Commit and push after each one.”
“Be ambitious.”
“I have to leave for a while now.”
— The human’s 3rd prompt, starting the autonomous run
Why this worked: “one at a time with subagents” taught the compaction-survival architecture. “Commit after each one” created natural checkpoints. “Be ambitious” gave permission to keep going. “I have to leave” removed the expectation of feedback loops.
Domain Fitness
TUI features are naturally modular
Best Domains
Self-contained features. Each one is add-a-keybinding, implement-a-handler. New features don’t modify existing ones. 1–3 files per feature.
CLI features • UI components • API endpoints • Test suites
Worst Domains
Cross-cutting changes. Tasks where you need to understand the whole system. High dependency between pieces.
Refactoring • Architecture changes • Complex debugging
The pattern: /search doesn’t need to understand /theme. Each TUI feature has clear boundaries—a keybinding, a handler, maybe a new widget. Sub-agents get full context in a single session.
External Memory
SESSION-HANDOFF.md as lifeline
When context compacts, files on disk survive.
## Architecture Decisions
- Textual framework, not curses
- Dark theme with accent colors
- All commands via command palette
## Conventions
- Feature per file in features/
- Tests mirror source structure
- Commit messages: "feat: description"
## Feature Roadmap
- [ ] /search command
- [ ] /theme switcher
- [ ] Session history browser
- [ ] ...
4
Times read during session
1st
Prompt: “Read SESSION-HANDOFF.md”
Verification Pipeline
Build testing tools before the marathon
SVG-based visual verification gave sub-agents a way to self-check without a human.
tui_capture.py
Headless screenshot via Textual Pilot API. Captures the TUI as SVG without a terminal.
svg_parser.py
Extracts text content, colors, and positions from SVG. Structured data, not pixels.
ux_test.sh
One-command validation. Run the TUI, capture, parse, assert. Full pipeline in seconds.
SVG parsing beats OCR for TUI work. The Textual framework renders to SVG natively. Parsing SVG for text and layout is deterministic—no model inference, no flakiness. Each sub-agent could verify its own work.
Honest Assessment
Feature reimplementation was real
~50–60
Truly Unique Features
Out of 138 feature commits. Later batches (15+) repeated features from batches 1–5. No deduplication mechanism existed.
Features 1–30: all unique
60–138: heavy overlap
The fix: maintain a COMPLETED.md file or check git log before starting each feature. A simple deduplication gate.
Patterns
Five patterns for long-running agents
-
1
Orchestrator + Disposable WorkersMain session delegates everything, stays small. Survives compaction.
-
2
External Memory FilesSESSION-HANDOFF.md on disk survives any amount of compaction. The agent’s true long-term memory.
-
3
Commit-After-Each CadenceNatural checkpoints. Easy rollback. Progress is always saved to git.
-
4
Build Verification FirstTesting tools before the feature marathon. Sub-agents can self-check.
-
5
Deduplication GateCheck what’s done before starting the next feature. The missing pattern from this session.
Prompting Playbook
What the human did right
External Memory Pointers
“Read SESSION-HANDOFF.md” — point the agent at its own memory instead of repeating context.
Minimal Interventions
Most prompts were “yes, go ahead” or “keep going.” Only 9 prompts total across 62 hours.
Delegation Instruction
“One at a time with subagents” — a single sentence that defined the compaction-safe architecture.
Open-Ended Permission
“Be ambitious.” “Add as many as you can.” Giving scope and trust rather than prescriptive lists.
Specific When Needed
When intervening, gave exact text references and concrete direction—never vague descriptions.
Architect, Not Worker
The user’s role was to design the agent’s work architecture, not to do the work itself.
The Sequel
Two weeks later, the same pattern—refined
The developer had learned from Session 1. This time: plan first, then launch.
Session 1: “Be ambitious, go.”
Open-ended feature list. No specs. No test IDs. The agent decided what to build as it went. Brilliant improvisation—but duplication crept in after batch 5.
Session 2: “Here’s exactly what to build.”
Full health audit first: all 356 tests passing. 16 features brainstormed across 5 waves, specced with acceptance criteria, pre-assigned test IDs. An orchestration context file. The deduplication problem was solved before the session started.
The preparation: FEATURE-BACKLOG.md with full specs and pre-assigned test IDs. build-session.md as the orchestration context. All 356 existing tests verified (263 unit + 93 interactive) before launch. Committed at 5745df1—then one prompt to go.
Session 2 Results
14 features. 659 new tests. One prompt.
“Run the build session. Work through the feature backlog.”
| Session Tags | 44 tests |
| Clipboard Ring | 52 tests |
| Inline Diff | 45 tests |
| Agent Tree View | 56 tests |
| Context Profiler | 47 tests |
| Tool Introspection | 50 tests |
| Recipe Pipeline Viz | 49 tests |
| Smart /include | 48 tests |
| Conversation Branching | 57 tests |
| Model A/B Testing | 52 tests |
| Session Replay | 91 tests |
| Plugin System | 53 tests |
| Heatmap Dashboard | 65 tests |
The Pattern
Each session makes the next one better
Session 1 Taught
Orchestrator + disposable workers. Commit after each feature. External memory files. Open-ended permission works—but needs a deduplication gate.
Session 2 Applied
Structured feature backlog with specs. Pre-assigned test IDs for every feature. Orchestration context file. The backlog is the deduplication gate.
The Result
14 features in one evening, zero duplication. Session 1’s missing pattern—a deduplication gate—was solved by the feature backlog itself.
Session 1
~50–60 unique features over 62 hours. 9 prompts. Creative but some duplication. Pioneered the architecture.
Session 2
14 features in one evening. 1 prompt. Zero duplication. 659 new tests. Refined the architecture.
The compounding insight: Session 1 was exploration—discover what works. Session 2 was exploitation—apply what you learned. The human’s role evolves from architect to systems designer.
Sources
Research Methodology
Data as of: February 20, 2026
Feature status: Active (amplifier-tui)
Research performed:
- Session verification:
find ~/.amplifier -name "*.json" -path "*/653a396c*" (hundreds of sub-session files confirmed)
- Repo verification: amplifier-tui exists at ~/dev/ANext/amplifier-tui
- Sub-session types: self-delegate, foundation:git-ops, foundation:explorer
Gaps & estimates:
- Detailed metrics (62 hrs, 440 sub-sessions, 631 compactions, 10,600 events) from prior session event analysis, not re-verified
- "0 session errors" means zero logged errors in the orchestrator — does not imply zero bugs in output code
- Session 2 metrics from separate session analysis
Session architect: Sam Schillace
Try It Yourself
Design the work.
Let the agent build.
Two sessions. One pattern refined. The human’s role is to design the agent’s work architecture, not to do the work.
*Orchestrator errors logged — does not imply zero bugs in output code
The TUI: github.com/ramparte/amplifier-tui
Amplifier: github.com/microsoft/amplifier
Orchestrate. Delegate. Ship.
More Amplifier Stories