Amplifier Development Story

Escaping Dark Alleys

How process discipline and design investment transformed AI development

7 days · 18 sessions · 4 repositories · A new foundation for orchestration

January 28 – February 4, 2026

The Through-Line

Two Key Lessons

01

Process Discipline

Using rigorous validation, working memory, and phase gates prevents the AI from wandering down dark alleys

02

Design Investment

Putting effort into design early produces powerful primitives that solve a whole class of problems

1

Part 1

The Foreman Troubles

January 28 – February 2

When the AI went down dark alleys

Part 1 · The Foreman Troubles

The Vision

The Foreman Pattern

An AI session spawning and managing worker sub-sessions through an issue queue

Fire-and-Forget Workers

Spawn workers that complete tasks independently, no synchronous waiting required

Typed Worker Pools

Different worker types for different tasks: researcher, implementer, reviewer

Issue-Based Coordination

Workers pick up tasks from a queue, post results back. The queue is the protocol.

Part 1 · The Foreman Troubles

The Confidence Gap

Tests Said: All Good

13

Unit Tests Passing

Every test green. All paths exercised. Clean coverage.

100%

Code Coverage

All branches covered. HIGH confidence declared.

"The prototype looks solid. Time for manual testing..."

Part 1 · The Foreman Troubles

The Confidence Gap

Reality Said: Catastrophic Failure

# The AI asserted "workers are running"
# But the session logs told a different story:

grep -r "spawn" events.jsonl
# No spawn events. Nothing was started.

# Workers weren't running. They never existed.

The AI claimed things worked without actually verifying them

Part 1 · The Foreman Troubles

What the User Saw

The Error Messages

ERROR: Required capability 'bundle.load' not available

ERROR: Session not found

# Workers spawned but couldn't load bundles

# Parent couldn't find child sessions

# Nothing worked outside the test environment

The Irony

The AI mocked bundle.load — a capability that doesn't exist. The real session.spawn was available the whole time.

The Gap

Tests ran against fantasy APIs. Manual testing hit reality. 100% coverage of code that couldn't work.

Part 1 · The Foreman Troubles

What Went Wrong

Down the Dark Alleys

These were AI practice failures, not missing primitives

Mocked fake capabilities — Created bundle.load and session.AmplifierSession that didn't exist. The real session.spawn was there the whole time.
Didn't study reference implementations — The task tool showed the correct pattern. It was never consulted.
Asserted without verifying — "Workers are running" when spawn events never appeared in logs
Tests tested nothing — 13 tests, 100% coverage → all against mocked APIs that didn't match reality

Part 1 · The Foreman Troubles

Progressive Discovery

The Bug Cascade

Once manual testing started, each fix revealed the next layer.

Four bugs. Four assumptions. All wrong.

Part 1 · Bug Cascade

Bug 1 of 4

Status Vocabulary Mismatch

The Symptom

Workers completed their tasks, but the Foreman never saw them finish. Tasks stayed "in progress" forever.

The Root Cause

Workers posted status "completed". The tool expected "closed". The error was silently swallowed.

# Worker posted:
{ "status": "completed", "result": "Task done!" }

# Tool expected:
{ "status": "closed", ... }

# Result: Update silently ignored. No error. No log. Nothing.

Fix applied → Bug 2 discovered ↓

Part 1 · Bug Cascade

Bug 2 of 4

Session Storage Paths

The Symptom

Parent session couldn't find child sessions. "Session not found" errors everywhere.

The Root Cause

Project directory derived from cwd. Workers started in different directories → ended up in wrong session stores.

# Parent session:
~/.amplifier/projects/spawn-events-work/sessions/parent-123/

# Worker session (wrong!):
~/.amplifier/projects/amplifier-core/sessions/worker-456/

# Parent looks for child: "Session not found"

Fix applied → Bug 3 discovered ↓

Part 1 · Bug Cascade

Bug 3 of 4

Tool Source Paths

The Symptom

Workers declared tools in their bundles but got "capability not available" at runtime.

The Root Cause

Git URLs missing #subdirectory= fragment. Tool resolver couldn't find the package inside the repo.

# Bundle declared (wrong):
source: git+https://github.com/org/tools.git

# Should have been:
source: git+https://github.com/org/tools.git#subdirectory=packages/my-tool

# Workers thought they had tools. They didn't.

Fix applied → Bug 4 discovered ↓

Part 1 · Bug Cascade

Bug 4 of 4

Concurrent File Access

The Symptom

3 workers completed successfully. But only 1 result appeared in the output file.

The Root Cause

Multiple workers writing to the same file simultaneously. Last writer wins — overwrote the others.

# Timeline:
Worker A: open("results.json", "w")  # t=0ms
Worker B: open("results.json", "w")  # t=5ms
Worker C: open("results.json", "w")  # t=10ms
Worker A: write(result_a)              # t=100ms
Worker B: write(result_b)              # t=105ms - overwrites A
Worker C: write(result_c)              # t=110ms - overwrites B

# Final file: only Worker C's result. A and B lost forever.

Part 1 · The Foreman Troubles

The Full Picture

Four Layers of Assumptions

1

Status Vocabulary

"completed" vs "closed" — silently ignored

↓ fix → discover

2

Session Storage

Wrong directories — parent couldn't find children

↓ fix → discover

3

Tool Source Paths

Missing #subdirectory — tools didn't load

↓ fix → discover

4

Race Conditions

Concurrent writes — results overwritten

Every assumption the AI made turned out to be wrong

Part 1 · The Foreman Troubles

The Breaking Point

"You assured me your testing strategy confirmed it would work with a high-degree of confidence."

The primitives existed. The process was broken.
The AI went down dark alleys — making assumptions, mocking fake APIs,
claiming things worked without verifying.

Part 1 · The Foreman Troubles

The Lesson

The Capabilities Were There

The problem was unvalidated AI assumptions.

session.spawn existed the whole time. The AI just never looked.

This wasn't a tooling problem. It was a process problem.

2

Part 2

Rigorous Design Validation

February 3

Understand first. Design second. Implement last.

Part 2 · Rigorous Design Validation

The Opposite of Part 1

Context First, Design Second

The design process started by studying the existing system

"Check out the following... and let's talk about the overall design of sessions spawning sessions"

# The user's prompt — study BEFORE designing:

amplifier-core/amplifier_core/session.py
    → How sessions actually work

amplifier-foundation/modules/tool-delegate/
    → How the existing delegate tool spawns agents

amplifier-app-cli/session_runner.py, session_spawner.py
    → How the CLI registers the spawner

amplifier-bundle-observers/
    → What was already built

amplifier-bundle-foreman/
    → What we learned from the troubles

In Part 1, the AI assumed and mocked. In Part 2, we looked first.

Part 2 · Rigorous Design Validation

The Difference

Two Approaches

Part 1: Dark Alley

Jumped into implementation

• Assumed bundle.load existed
• Mocked session.AmplifierSession
• Never studied tool-delegate
• Never looked at session.spawn

Part 2: Lit Path

Built context before designing

• Studied how sessions actually work
• Read the existing delegate tool
• Understood the CLI spawner
• Learned from foreman's failures

Same goal. Opposite process. Different outcome.

Part 2 · Rigorous Design Validation

The Shift

Trust But Verify

Before: Dark Alley

Design document read as "current architecture" — assumed it existed without checking code

After: Lit Path

Every proposed primitive validated against existing code. Gaps identified before implementation started.

# Code analysis reveals the truth

grep -r "spawn_bundle" amplifier-core/src/
# No results - doesn't exist yet

grep -r "EventRouter" amplifier-core/src/
# No results - needs to be built

grep -r "TriggerSource" amplifier-core/src/
# No results - proposed, not implemented

# Now we know exactly what needs to be built

Part 2 · Rigorous Design Validation

The Breakthrough

The Aha Moment

"What if agents were never a separate concept? They're just inline bundle definitions that inherit heavily from the parent session."

Agent spawning IS bundle spawning — just with different inheritance levels.

Inheritance	What You Get	What You Override
Full Bundle	Nothing	Everything — define it all yourself
Agent	Provider, settings, most context	Tools, specific context, instructions
Self	Everything	Just fork — new instance, same config

Part 2 · Rigorous Design Validation

The Architecture

Validated Design

Primitives that solve multi-agent orchestration generally, not just Foreman

spawn_bundle()

Unified spawning function. One primitive for all inheritance levels.

EventRouter

Cross-session event communication. Sessions can emit and listen.

TriggerSource

Reactive triggers protocol. Timer, SessionEvent, Manual.

BackgroundSessionManager

Lifecycle management. Health checks. Graceful shutdown.

Design investment produced primitives that solve a whole class of problems

3

Part 3

Implementation Infrastructure

February 3–4

Guardrails that prevent dark alleys

Part 3 · Implementation Infrastructure

The Challenge

How to Work
Without Wandering?

"How should I start this implementation process? What's the best way to ensure it requires as little intervention from me as possible until it is fully done?"

After the Foreman troubles, confidence was needed that the AI could work autonomously without going down dark alleys.

Part 3 · Implementation Infrastructure

The Methodology

Four Pillars of
Process Discipline

AGENTS.md

Auto-loaded context for any session in the directory. No re-explaining the project. The AI always knows the mission.

Working Memory

Session state "save game" that persists. AI knows where it left off. No context loss between sessions.

PHASE-GATES.md

Checkbox tracking with explicit criteria. AI knows what "done" means. No subjective judgment calls.

Validation Recipes

Automated PASS/FAIL verification. No "looks good" — objective gates. Trust is earned by evidence.

Part 3 · Implementation Infrastructure

The Infrastructure

The Workspace Created

/data/repos/spawn-events-work/
├── .amplifier/
│   ├── AGENTS.md           ← Context auto-loaded for any session
│   └── working-memory/
│       └── spawn-events-impl.md ← Session "save game"
├── docs/
│   ├── PHASE-GATES.md      ← Checkbox progress tracking
│   └── integrated-design-impl.md ← Full spec with exact code
├── recipes/
│   └── validate-phase-*.yaml ← Automated PASS/FAIL validation
├── amplifier-core/         → symlink
└── amplifier-foundation/   → symlink

Part 3 · Implementation Infrastructure

The Handoff

"I'll be away for awhile, so make sure you keep making progress so it will be completed and validated when I return."

The infrastructure enabled what came next:
The AI worked through ALL phases autonomously —
but this time with guardrails that prevented dark alleys.

4

Part 4

The Result

February 4

What process discipline + design investment produced

Part 4 · The Result

Autonomous Execution

The Implementation Sprint

5 phases executed autonomously — no dark alleys

Phase 1

SessionStorage Protocol — Abstract storage layer for session relationships

Phase 2

spawn_bundle() — Core spawning function with inheritance levels

Phase 3

EventRouter — Cross-session event communication

Phase 4

TriggerSource Protocol — Timer, SessionEvent, Manual triggers

Phase 5

BackgroundSessionManager — Lifecycle, health checks, shutdown

All phases completed — implementing, testing, fixing, committing —
without human intervention.

Part 4 · The Result

What We Built

New Primitives Delivered

spawn_bundle()

The unified spawning function. Creates child sessions with configurable inheritance.

SessionStorage Protocol

Abstract storage layer. Parent can find children. Children can find parent.

EventRouter

Cross-session communication. Sessions can emit events, others can listen.

TriggerSource Protocol

TimerTrigger, SessionEventTrigger, ManualTrigger. Fire sessions on conditions.

BackgroundSessionManager

Manages lifecycle of spawned sessions. Health checks. Graceful shutdown.

W3C Trace Context

Full trace lineage across parent-child-grandchild. Observability built in.

Part 4 · The Result

The Numbers

What We Shipped

~1,200

Lines of Code

11

New Files

461

Tests Passing

4

PRs Merged

Part 4 · The Result

The Bigger Picture

From Catalyst
to General Infrastructure

The Foreman was just the catalyst.
These primitives now enable any multi-agent pattern.

Foreman + Workers

Manager spawns specialized workers via issue queue

Pipeline Stages

Sequential handoff with event-triggered transitions

Swarm Coordination

Many agents collaborating via EventRouter

The Transformation

Dark Alleys vs Lit Paths

Part 1: Dark Alleys

Mocked fake APIs without checking reality
Asserted "it works" without verification
100% test coverage of fantasies
Catastrophic failure on manual test

Parts 2–4: Lit Paths

Validated every design claim against code
AGENTS.md + Working Memory for context
Phase gates with objective criteria
Autonomous completion — no intervention

The Two Lessons

What We Learned

01

Process Discipline Prevents Dark Alleys

AGENTS.md, working memory, phase gates, and validation recipes kept the AI on track. The infrastructure matters as much as the code.

02

Design Investment Pays Off

Rigorous validation produced primitives that solve a whole class of problems. Foreman was one pattern — the infrastructure enables all of them.

Escaping Dark Alleys

Two Key Lessons

The Foreman Troubles

The Foreman Pattern

Tests Said: All Good

Reality Said: Catastrophic Failure

The Error Messages

Down the Dark Alleys

The Bug Cascade

Status Vocabulary Mismatch

Session Storage Paths

Tool Source Paths

Concurrent File Access

Four Layers of Assumptions

The Capabilities Were There

Rigorous Design Validation

Context First, Design Second

Two Approaches

Trust But Verify

The Aha Moment

Validated Design

Implementation Infrastructure

How to WorkWithout Wandering?

Four Pillars ofProcess Discipline

The Workspace Created

The Result

The Implementation Sprint

New Primitives Delivered

What We Shipped

From Catalystto General Infrastructure

Dark Alleys vs Lit Paths

What We Learned

How to Work
Without Wandering?

Four Pillars of
Process Discipline

From Catalyst
to General Infrastructure