Case Study

The Marathon Session

62 Hours. 9 Prompts. 147 Commits.

How an Amplifier session built an entire TUI application autonomously

Active

Session 653a396c • amplifier-tui • February 2026
Session architect: Sam Schillace

The Setup

Symphony night

A developer built testing tools, wrote a feature roadmap, pointed the agent at its memory file—then left for the evening.

The Project

amplifier-tui — a Textual-based terminal UI for Amplifier. A full interactive TUI for managing sessions, running commands, and browsing history.

The Prep Work

SVG-based visual verification tools built first. A feature list. A SESSION-HANDOFF.md with architecture decisions and conventions.

The Launch

“Work through as many features as you can. Commit after each one. Be ambitious. I have to leave for a while now.”

By the Numbers

What one session produced

Wall-clock duration	62 hours
Active work time	~22 hours
Human prompts	9
Sub-sessions spawned	440
Context compactions	631
Git commits	147
Feature commits	138
Session errors logged*	0
Events logged	10,600

*Zero errors logged by the session orchestrator — does not imply zero bugs in output code

Compaction Survival

631 compactions. Zero lost context.

The session’s memory was wiped 631 times—and it kept building.

103K

Post-Compaction Tokens

Stayed stable at ~103–105K throughout the entire session. Consistent working memory baseline.

1.2M

Peak Pre-Compaction

Context grew from 111K to 1.2M tokens before each compaction. By the end, 92% was discarded each cycle.

            The key insight: the orchestrator’s own state was tiny. It only tracked “what feature am I on?”—everything else lived in disposable sub-sessions. Small state survives aggressive compaction.
        

Architecture

Orchestrator + Disposable Workers

The main session never wrote a single line of code.

Orchestrator
Main Session

→

145 ×

self-delegate

→

159 ×

git-ops

Pure Orchestration

Main session only decided what to build next. Never touched code directly.

Disposable Workers

Each feature built in a fresh sub-session. Full context window for the task. Discarded after commit.

Tiny Footprint

Orchestration state = feature list + current position. Survives any amount of compaction.

The Launch Prompt

Four lines that defined the architecture

“Work through as many features as you can, one at a time with subagents so you don’t get exhausted.”

“Commit and push after each one.”

“Be ambitious.”

“I have to leave for a while now.”

— The human’s 3rd prompt, starting the autonomous run

            Why this worked: “one at a time with subagents” taught the compaction-survival architecture. “Commit after each one” created natural checkpoints. “Be ambitious” gave permission to keep going. “I have to leave” removed the expectation of feedback loops.
        

Domain Fitness

TUI features are naturally modular

Best Domains

Self-contained features. Each one is add-a-keybinding, implement-a-handler. New features don’t modify existing ones. 1–3 files per feature.

CLI features • UI components • API endpoints • Test suites

Worst Domains

Cross-cutting changes. Tasks where you need to understand the whole system. High dependency between pieces.

Refactoring • Architecture changes • Complex debugging

            The pattern: /search doesn’t need to understand /theme. Each TUI feature has clear boundaries—a keybinding, a handler, maybe a new widget. Sub-agents get full context in a single session.
        

External Memory

SESSION-HANDOFF.md as lifeline

When context compacts, files on disk survive.

# SESSION-HANDOFF.md

## Architecture Decisions
- Textual framework, not curses
- Dark theme with accent colors
- All commands via command palette

## Conventions
- Feature per file in features/
- Tests mirror source structure
- Commit messages: "feat: description"

## Feature Roadmap
- [ ] /search command
- [ ] /theme switcher
- [ ] Session history browser
- [ ] ...

4

Times read during session

631

Compactions survived

1st

Prompt: “Read SESSION-HANDOFF.md”

Verification Pipeline

Build testing tools before the marathon

SVG-based visual verification gave sub-agents a way to self-check without a human.

tui_capture.py

Headless screenshot via Textual Pilot API. Captures the TUI as SVG without a terminal.

svg_parser.py

Extracts text content, colors, and positions from SVG. Structured data, not pixels.

ux_test.sh

One-command validation. Run the TUI, capture, parse, assert. Full pipeline in seconds.

            SVG parsing beats OCR for TUI work. The Textual framework renders to SVG natively. Parsing SVG for text and layout is deterministic—no model inference, no flakiness. Each sub-agent could verify its own work.
        

Honest Assessment

Feature reimplementation was real

~50–60

Truly Unique Features

Out of 138 feature commits. Later batches (15+) repeated features from batches 1–5. No deduplication mechanism existed.

Features 1–30: all unique 60–138: heavy overlap

unique

mixed

overlap

The fix: maintain a COMPLETED.md file or check git log before starting each feature. A simple deduplication gate.

Patterns

Five patterns for long-running agents

1
Orchestrator + Disposable WorkersMain session delegates everything, stays small. Survives compaction.
2
External Memory FilesSESSION-HANDOFF.md on disk survives any amount of compaction. The agent’s true long-term memory.
3
Commit-After-Each CadenceNatural checkpoints. Easy rollback. Progress is always saved to git.
4
Build Verification FirstTesting tools before the feature marathon. Sub-agents can self-check.
5
Deduplication GateCheck what’s done before starting the next feature. The missing pattern from this session.

Prompting Playbook

What the human did right

External Memory Pointers

“Read SESSION-HANDOFF.md” — point the agent at its own memory instead of repeating context.

Minimal Interventions

Most prompts were “yes, go ahead” or “keep going.” Only 9 prompts total across 62 hours.

Delegation Instruction

“One at a time with subagents” — a single sentence that defined the compaction-safe architecture.

Open-Ended Permission

“Be ambitious.” “Add as many as you can.” Giving scope and trust rather than prescriptive lists.

Specific When Needed

When intervening, gave exact text references and concrete direction—never vague descriptions.

Architect, Not Worker

The user’s role was to design the agent’s work architecture, not to do the work itself.

The Sequel

Two weeks later, the same pattern—refined

The developer had learned from Session 1. This time: plan first, then launch.

Session 1: “Be ambitious, go.”

Open-ended feature list. No specs. No test IDs. The agent decided what to build as it went. Brilliant improvisation—but duplication crept in after batch 5.

Session 2: “Here’s exactly what to build.”

Full health audit first: all 356 tests passing. 16 features brainstormed across 5 waves, specced with acceptance criteria, pre-assigned test IDs. An orchestration context file. The deduplication problem was solved before the session started.

            The preparation: FEATURE-BACKLOG.md with full specs and pre-assigned test IDs. build-session.md as the orchestration context. All 356 existing tests verified (263 unit + 93 interactive) before launch. Committed at 5745df1—then one prompt to go.
        

Session 2 Results

14 features. 659 new tests. One prompt.

“Run the build session. Work through the feature backlog.”

Session Tags	44 tests
Clipboard Ring	52 tests
Inline Diff	45 tests
Agent Tree View	56 tests
Context Profiler	47 tests
Tool Introspection	50 tests
Recipe Pipeline Viz	49 tests
Smart /include	48 tests
Conversation Branching	57 tests
Model A/B Testing	52 tests
Session Replay	91 tests
Plugin System	53 tests
Heatmap Dashboard	65 tests

263

Tests before

922

Tests after

3.5×

Growth

15

Commits pushed

The Pattern

Each session makes the next one better

Session 1 Taught

Orchestrator + disposable workers. Commit after each feature. External memory files. Open-ended permission works—but needs a deduplication gate.

Session 2 Applied

Structured feature backlog with specs. Pre-assigned test IDs for every feature. Orchestration context file. The backlog is the deduplication gate.

The Result

14 features in one evening, zero duplication. Session 1’s missing pattern—a deduplication gate—was solved by the feature backlog itself.

Session 1

~50–60 unique features over 62 hours. 9 prompts. Creative but some duplication. Pioneered the architecture.

Session 2

14 features in one evening. 1 prompt. Zero duplication. 659 new tests. Refined the architecture.

            The compounding insight: Session 1 was exploration—discover what works. Session 2 was exploitation—apply what you learned. The human’s role evolves from architect to systems designer.
        

Sources

Research Methodology

Data as of: February 20, 2026

Feature status: Active (amplifier-tui)

Research performed:

Session verification: find ~/.amplifier -name "*.json" -path "*/653a396c*" (hundreds of sub-session files confirmed)
Repo verification: amplifier-tui exists at ~/dev/ANext/amplifier-tui
Sub-session types: self-delegate, foundation:git-ops, foundation:explorer

Gaps & estimates:

Detailed metrics (62 hrs, 440 sub-sessions, 631 compactions, 10,600 events) from prior session event analysis, not re-verified
"0 session errors" means zero logged errors in the orchestrator — does not imply zero bugs in output code
Session 2 metrics from separate session analysis

Session architect: Sam Schillace

Try It Yourself

Design the work.
Let the agent build.

Two sessions. One pattern refined. The human’s role is to design the agent’s work architecture, not to do the work.

2

Sessions

162

Commits

922

Tests

0

Session Errors*

*Orchestrator errors logged — does not imply zero bugs in output code

The TUI: github.com/ramparte/amplifier-tui
Amplifier: github.com/microsoft/amplifier

Orchestrate. Delegate. Ship.