More Amplifier Stories
Amplifier Development Story

Escaping Dark Alleys

How process discipline and design investment transformed AI development

7 days · 18 sessions · 4 repositories · A new foundation for orchestration

January 28 – February 4, 2026
The Through-Line

Two Key Lessons

01
Process Discipline
Using rigorous validation, working memory, and phase gates prevents the AI from wandering down dark alleys
02
Design Investment
Putting effort into design early produces powerful primitives that solve a whole class of problems
1
Part 1

The Foreman Troubles

January 28 – February 2

When the AI went down dark alleys

Part 1 · The Foreman Troubles
The Vision

The Foreman Pattern

An AI session spawning and managing worker sub-sessions through an issue queue

Fire-and-Forget Workers
Spawn workers that complete tasks independently, no synchronous waiting required
Typed Worker Pools
Different worker types for different tasks: researcher, implementer, reviewer
Issue-Based Coordination
Workers pick up tasks from a queue, post results back. The queue is the protocol.
Part 1 · The Foreman Troubles
The Confidence Gap

Tests Said: All Good

13
Unit Tests Passing
Every test green. All paths exercised. Clean coverage.
100%
Code Coverage
All branches covered. HIGH confidence declared.
"The prototype looks solid. Time for manual testing..."
Part 1 · The Foreman Troubles
The Confidence Gap

Reality Said: Catastrophic Failure

# The AI asserted "workers are running" # But the session logs told a different story: grep -r "spawn" events.jsonl # No spawn events. Nothing was started. # Workers weren't running. They never existed.

The AI claimed things worked without actually verifying them

Part 1 · The Foreman Troubles
What the User Saw

The Error Messages

ERROR: Required capability 'bundle.load' not available
ERROR: Session not found
# Workers spawned but couldn't load bundles
# Parent couldn't find child sessions
# Nothing worked outside the test environment
The Irony
The AI mocked bundle.load — a capability that doesn't exist. The real session.spawn was available the whole time.
The Gap
Tests ran against fantasy APIs. Manual testing hit reality. 100% coverage of code that couldn't work.
Part 1 · The Foreman Troubles
What Went Wrong

Down the Dark Alleys

These were AI practice failures, not missing primitives

Part 1 · The Foreman Troubles
Progressive Discovery

The Bug Cascade

Once manual testing started, each fix revealed the next layer.

Four bugs. Four assumptions. All wrong.

Part 1 · Bug Cascade
Bug 1 of 4

Status Vocabulary Mismatch

The Symptom
Workers completed their tasks, but the Foreman never saw them finish. Tasks stayed "in progress" forever.
The Root Cause
Workers posted status "completed". The tool expected "closed". The error was silently swallowed.
# Worker posted: { "status": "completed", "result": "Task done!" } # Tool expected: { "status": "closed", ... } # Result: Update silently ignored. No error. No log. Nothing.

Fix applied → Bug 2 discovered ↓

Part 1 · Bug Cascade
Bug 2 of 4

Session Storage Paths

The Symptom
Parent session couldn't find child sessions. "Session not found" errors everywhere.
The Root Cause
Project directory derived from cwd. Workers started in different directories → ended up in wrong session stores.
# Parent session: ~/.amplifier/projects/spawn-events-work/sessions/parent-123/ # Worker session (wrong!): ~/.amplifier/projects/amplifier-core/sessions/worker-456/ # Parent looks for child: "Session not found"

Fix applied → Bug 3 discovered ↓

Part 1 · Bug Cascade
Bug 3 of 4

Tool Source Paths

The Symptom
Workers declared tools in their bundles but got "capability not available" at runtime.
The Root Cause
Git URLs missing #subdirectory= fragment. Tool resolver couldn't find the package inside the repo.
# Bundle declared (wrong): source: git+https://github.com/org/tools.git # Should have been: source: git+https://github.com/org/tools.git#subdirectory=packages/my-tool # Workers thought they had tools. They didn't.

Fix applied → Bug 4 discovered ↓

Part 1 · Bug Cascade
Bug 4 of 4

Concurrent File Access

The Symptom
3 workers completed successfully. But only 1 result appeared in the output file.
The Root Cause
Multiple workers writing to the same file simultaneously. Last writer wins — overwrote the others.
# Timeline: Worker A: open("results.json", "w") # t=0ms Worker B: open("results.json", "w") # t=5ms Worker C: open("results.json", "w") # t=10ms Worker A: write(result_a) # t=100ms Worker B: write(result_b) # t=105ms - overwrites A Worker C: write(result_c) # t=110ms - overwrites B # Final file: only Worker C's result. A and B lost forever.
Part 1 · The Foreman Troubles
The Full Picture

Four Layers of Assumptions

1
Status Vocabulary
"completed" vs "closed" — silently ignored
↓ fix → discover
2
Session Storage
Wrong directories — parent couldn't find children
↓ fix → discover
3
Tool Source Paths
Missing #subdirectory — tools didn't load
↓ fix → discover
4
Race Conditions
Concurrent writes — results overwritten

Every assumption the AI made turned out to be wrong

Part 1 · The Foreman Troubles
The Breaking Point
"You assured me your testing strategy confirmed it would work with a high-degree of confidence."

The primitives existed. The process was broken.
The AI went down dark alleys — making assumptions, mocking fake APIs,
claiming things worked without verifying.

Part 1 · The Foreman Troubles
The Lesson

The Capabilities Were There

The problem was unvalidated AI assumptions.

session.spawn existed the whole time. The AI just never looked.

This wasn't a tooling problem. It was a process problem.

2
Part 2

Rigorous Design Validation

February 3

Understand first. Design second. Implement last.

Part 2 · Rigorous Design Validation
The Opposite of Part 1

Context First, Design Second

The design process started by studying the existing system

"Check out the following... and let's talk about the overall design of sessions spawning sessions"
# The user's prompt — study BEFORE designing: amplifier-core/amplifier_core/session.py → How sessions actually work amplifier-foundation/modules/tool-delegate/ → How the existing delegate tool spawns agents amplifier-app-cli/session_runner.py, session_spawner.py → How the CLI registers the spawner amplifier-bundle-observers/ → What was already built amplifier-bundle-foreman/ → What we learned from the troubles

In Part 1, the AI assumed and mocked. In Part 2, we looked first.

Part 2 · Rigorous Design Validation
The Difference

Two Approaches

Part 1: Dark Alley
Jumped into implementation

• Assumed bundle.load existed
• Mocked session.AmplifierSession
• Never studied tool-delegate
• Never looked at session.spawn
Part 2: Lit Path
Built context before designing

• Studied how sessions actually work
• Read the existing delegate tool
• Understood the CLI spawner
• Learned from foreman's failures

Same goal. Opposite process. Different outcome.

Part 2 · Rigorous Design Validation
The Shift

Trust But Verify

Before: Dark Alley
Design document read as "current architecture" — assumed it existed without checking code
After: Lit Path
Every proposed primitive validated against existing code. Gaps identified before implementation started.
# Code analysis reveals the truth grep -r "spawn_bundle" amplifier-core/src/ # No results - doesn't exist yet grep -r "EventRouter" amplifier-core/src/ # No results - needs to be built grep -r "TriggerSource" amplifier-core/src/ # No results - proposed, not implemented # Now we know exactly what needs to be built
Part 2 · Rigorous Design Validation
The Breakthrough

The Aha Moment

"What if agents were never a separate concept? They're just inline bundle definitions that inherit heavily from the parent session."

Agent spawning IS bundle spawning — just with different inheritance levels.

Inheritance What You Get What You Override
Full Bundle Nothing Everything — define it all yourself
Agent Provider, settings, most context Tools, specific context, instructions
Self Everything Just fork — new instance, same config
Part 2 · Rigorous Design Validation
The Architecture

Validated Design

Primitives that solve multi-agent orchestration generally, not just Foreman

spawn_bundle()
Unified spawning function. One primitive for all inheritance levels.
EventRouter
Cross-session event communication. Sessions can emit and listen.
TriggerSource
Reactive triggers protocol. Timer, SessionEvent, Manual.
BackgroundSessionManager
Lifecycle management. Health checks. Graceful shutdown.

Design investment produced primitives that solve a whole class of problems

3
Part 3

Implementation Infrastructure

February 3–4

Guardrails that prevent dark alleys

Part 3 · Implementation Infrastructure
The Challenge

How to Work
Without Wandering?

"How should I start this implementation process? What's the best way to ensure it requires as little intervention from me as possible until it is fully done?"

After the Foreman troubles, confidence was needed that the AI could work autonomously without going down dark alleys.

Part 3 · Implementation Infrastructure
The Methodology

Four Pillars of
Process Discipline

AGENTS.md
Auto-loaded context for any session in the directory. No re-explaining the project. The AI always knows the mission.
Working Memory
Session state "save game" that persists. AI knows where it left off. No context loss between sessions.
PHASE-GATES.md
Checkbox tracking with explicit criteria. AI knows what "done" means. No subjective judgment calls.
Validation Recipes
Automated PASS/FAIL verification. No "looks good" — objective gates. Trust is earned by evidence.
Part 3 · Implementation Infrastructure
The Infrastructure

The Workspace Created

/data/repos/spawn-events-work/ ├── .amplifier/ │ ├── AGENTS.md ← Context auto-loaded for any session │ └── working-memory/ │ └── spawn-events-impl.md ← Session "save game" ├── docs/ │ ├── PHASE-GATES.md ← Checkbox progress tracking │ └── integrated-design-impl.md ← Full spec with exact code ├── recipes/ │ └── validate-phase-*.yaml ← Automated PASS/FAIL validation ├── amplifier-core/ → symlink └── amplifier-foundation/ → symlink
Part 3 · Implementation Infrastructure
The Handoff
"I'll be away for awhile, so make sure you keep making progress so it will be completed and validated when I return."

The infrastructure enabled what came next:
The AI worked through ALL phases autonomously —
but this time with guardrails that prevented dark alleys.

4
Part 4

The Result

February 4

What process discipline + design investment produced

Part 4 · The Result
Autonomous Execution

The Implementation Sprint

5 phases executed autonomously — no dark alleys

Phase 1
SessionStorage Protocol — Abstract storage layer for session relationships
Phase 2
spawn_bundle() — Core spawning function with inheritance levels
Phase 3
EventRouter — Cross-session event communication
Phase 4
TriggerSource Protocol — Timer, SessionEvent, Manual triggers
Phase 5
BackgroundSessionManager — Lifecycle, health checks, shutdown

All phases completed — implementing, testing, fixing, committing —
without human intervention.

Part 4 · The Result
What We Built

New Primitives Delivered

spawn_bundle()
The unified spawning function. Creates child sessions with configurable inheritance.
SessionStorage Protocol
Abstract storage layer. Parent can find children. Children can find parent.
EventRouter
Cross-session communication. Sessions can emit events, others can listen.
TriggerSource Protocol
TimerTrigger, SessionEventTrigger, ManualTrigger. Fire sessions on conditions.
BackgroundSessionManager
Manages lifecycle of spawned sessions. Health checks. Graceful shutdown.
W3C Trace Context
Full trace lineage across parent-child-grandchild. Observability built in.
Part 4 · The Result
The Numbers

What We Shipped

~1,200
Lines of Code
11
New Files
461
Tests Passing
4
PRs Merged
Part 4 · The Result
The Bigger Picture

From Catalyst
to General Infrastructure

The Foreman was just the catalyst.
These primitives now enable any multi-agent pattern.

Foreman + Workers
Manager spawns specialized workers via issue queue
Pipeline Stages
Sequential handoff with event-triggered transitions
Swarm Coordination
Many agents collaborating via EventRouter
The Transformation

Dark Alleys vs Lit Paths

Part 1: Dark Alleys
  • Mocked fake APIs without checking reality
  • Asserted "it works" without verification
  • 100% test coverage of fantasies
  • Catastrophic failure on manual test
Parts 2–4: Lit Paths
  • Validated every design claim against code
  • AGENTS.md + Working Memory for context
  • Phase gates with objective criteria
  • Autonomous completion — no intervention
The Two Lessons

What We Learned

01
Process Discipline Prevents Dark Alleys
AGENTS.md, working memory, phase gates, and validation recipes kept the AI on track. The infrastructure matters as much as the code.
02
Design Investment Pays Off
Rigorous validation produced primitives that solve a whole class of problems. Foreman was one pattern — the infrastructure enables all of them.
More Amplifier Stories