How process discipline and design investment transformed AI development
7 days · 18 sessions · 4 repositories · A new foundation for orchestration
January 28 – February 4, 2026
The Through-Line
Two Key Lessons
01
Process Discipline
Using rigorous validation, working memory, and phase gates prevents the AI from wandering down dark alleys
02
Design Investment
Putting effort into design early produces powerful primitives that solve a whole class of problems
1
Part 1
The Foreman Troubles
January 28 – February 2
When the AI went down dark alleys
Part 1 · The Foreman Troubles
The Vision
The Foreman Pattern
An AI session spawning and managing worker sub-sessions through an issue queue
Fire-and-Forget Workers
Spawn workers that complete tasks independently, no synchronous waiting required
Typed Worker Pools
Different worker types for different tasks: researcher, implementer, reviewer
Issue-Based Coordination
Workers pick up tasks from a queue, post results back. The queue is the protocol.
Part 1 · The Foreman Troubles
The Confidence Gap
Tests Said: All Good
13
Unit Tests Passing
Every test green. All paths exercised. Clean coverage.
100%
Code Coverage
All branches covered. HIGH confidence declared.
"The prototype looks solid. Time for manual testing..."
Part 1 · The Foreman Troubles
The Confidence Gap
Reality Said: Catastrophic Failure
# The AI asserted "workers are running"# But the session logs told a different story:grep -r "spawn" events.jsonl
# No spawn events. Nothing was started.# Workers weren't running. They never existed.
The AI claimed things worked without actually verifying them
Part 1 · The Foreman Troubles
What the User Saw
The Error Messages
ERROR: Required capability 'bundle.load' not available
ERROR: Session not found
# Workers spawned but couldn't load bundles
# Parent couldn't find child sessions
# Nothing worked outside the test environment
The Irony
The AI mocked bundle.load — a capability that doesn't exist. The real session.spawn was available the whole time.
The Gap
Tests ran against fantasy APIs. Manual testing hit reality. 100% coverage of code that couldn't work.
Part 1 · The Foreman Troubles
What Went Wrong
Down the Dark Alleys
These were AI practice failures, not missing primitives
Mocked fake capabilities — Created bundle.load and session.AmplifierSession that didn't exist. The real session.spawn was there the whole time.
Didn't study reference implementations — The task tool showed the correct pattern. It was never consulted.
Asserted without verifying — "Workers are running" when spawn events never appeared in logs
Tests tested nothing — 13 tests, 100% coverage → all against mocked APIs that didn't match reality
Part 1 · The Foreman Troubles
Progressive Discovery
The Bug Cascade
Once manual testing started, each fix revealed the next layer.
Four bugs. Four assumptions. All wrong.
Part 1 · Bug Cascade
Bug 1 of 4
Status Vocabulary Mismatch
The Symptom
Workers completed their tasks, but the Foreman never saw them finish. Tasks stayed "in progress" forever.
The Root Cause
Workers posted status "completed". The tool expected "closed". The error was silently swallowed.
Project directory derived from cwd. Workers started in different directories → ended up in wrong session stores.
# Parent session:
~/.amplifier/projects/spawn-events-work/sessions/parent-123/
# Worker session (wrong!):
~/.amplifier/projects/amplifier-core/sessions/worker-456/
# Parent looks for child: "Session not found"
Fix applied → Bug 3 discovered ↓
Part 1 · Bug Cascade
Bug 3 of 4
Tool Source Paths
The Symptom
Workers declared tools in their bundles but got "capability not available" at runtime.
The Root Cause
Git URLs missing #subdirectory= fragment. Tool resolver couldn't find the package inside the repo.
# Bundle declared (wrong):
source: git+https://github.com/org/tools.git# Should have been:
source: git+https://github.com/org/tools.git#subdirectory=packages/my-tool# Workers thought they had tools. They didn't.
Fix applied → Bug 4 discovered ↓
Part 1 · Bug Cascade
Bug 4 of 4
Concurrent File Access
The Symptom
3 workers completed successfully. But only 1 result appeared in the output file.
The Root Cause
Multiple workers writing to the same file simultaneously. Last writer wins — overwrote the others.
# Timeline:
Worker A: open("results.json", "w") # t=0ms
Worker B: open("results.json", "w") # t=5ms
Worker C: open("results.json", "w") # t=10ms
Worker A: write(result_a) # t=100ms
Worker B: write(result_b) # t=105ms - overwrites A
Worker C: write(result_c) # t=110ms - overwrites B# Final file: only Worker C's result. A and B lost forever.
Part 1 · The Foreman Troubles
The Full Picture
Four Layers of Assumptions
1
Status Vocabulary
"completed" vs "closed" — silently ignored
↓ fix → discover
2
Session Storage
Wrong directories — parent couldn't find children
↓ fix → discover
3
Tool Source Paths
Missing #subdirectory — tools didn't load
↓ fix → discover
4
Race Conditions
Concurrent writes — results overwritten
Every assumption the AI made turned out to be wrong
Part 1 · The Foreman Troubles
The Breaking Point
"You assured me your testing strategy confirmed it would work with a high-degree of confidence."
The primitives existed. The process was broken. The AI went down dark alleys — making assumptions, mocking fake APIs, claiming things worked without verifying.
Part 1 · The Foreman Troubles
The Lesson
The Capabilities Were There
The problem was unvalidated AI assumptions.
session.spawn existed the whole time. The AI just never looked.
This wasn't a tooling problem. It was a process problem.
2
Part 2
Rigorous Design Validation
February 3
Understand first. Design second. Implement last.
Part 2 · Rigorous Design Validation
The Opposite of Part 1
Context First, Design Second
The design process started by studying the existing system
"Check out the following... and let's talk about the overall design of sessions spawning sessions"
# The user's prompt — study BEFORE designing:amplifier-core/amplifier_core/session.py→ How sessions actually workamplifier-foundation/modules/tool-delegate/
→ How the existing delegate tool spawns agentsamplifier-app-cli/session_runner.py, session_spawner.py
→ How the CLI registers the spawneramplifier-bundle-observers/→ What was already builtamplifier-bundle-foreman/→ What we learned from the troubles
In Part 1, the AI assumed and mocked. In Part 2, we looked first.
Part 2 · Rigorous Design Validation
The Difference
Two Approaches
Part 1: Dark Alley
Jumped into implementation
• Assumed bundle.load existed
• Mocked session.AmplifierSession
• Never studied tool-delegate
• Never looked at session.spawn
Part 2: Lit Path
Built context before designing
• Studied how sessions actually work
• Read the existing delegate tool
• Understood the CLI spawner
• Learned from foreman's failures
Same goal. Opposite process. Different outcome.
Part 2 · Rigorous Design Validation
The Shift
Trust But Verify
Before: Dark Alley
Design document read as "current architecture" — assumed it existed without checking code
After: Lit Path
Every proposed primitive validated against existing code. Gaps identified before implementation started.
# Code analysis reveals the truthgrep -r "spawn_bundle" amplifier-core/src/
# No results - doesn't exist yetgrep -r "EventRouter" amplifier-core/src/
# No results - needs to be builtgrep -r "TriggerSource" amplifier-core/src/
# No results - proposed, not implemented# Now we know exactly what needs to be built
Part 2 · Rigorous Design Validation
The Breakthrough
The Aha Moment
"What if agents were never a separate concept? They're just inline bundle definitions that inherit heavily from the parent session."
Agent spawning IS bundle spawning — just with different inheritance levels.
Inheritance
What You Get
What You Override
Full Bundle
Nothing
Everything — define it all yourself
Agent
Provider, settings, most context
Tools, specific context, instructions
Self
Everything
Just fork — new instance, same config
Part 2 · Rigorous Design Validation
The Architecture
Validated Design
Primitives that solve multi-agent orchestration generally, not just Foreman
spawn_bundle()
Unified spawning function. One primitive for all inheritance levels.
EventRouter
Cross-session event communication. Sessions can emit and listen.
Lifecycle management. Health checks. Graceful shutdown.
Design investment produced primitives that solve a whole class of problems
3
Part 3
Implementation Infrastructure
February 3–4
Guardrails that prevent dark alleys
Part 3 · Implementation Infrastructure
The Challenge
How to Work Without Wandering?
"How should I start this implementation process? What's the best way to ensure it requires as little intervention from me as possible until it is fully done?"
After the Foreman troubles, confidence was needed that the AI could work autonomously without going down dark alleys.
Part 3 · Implementation Infrastructure
The Methodology
Four Pillars of Process Discipline
AGENTS.md
Auto-loaded context for any session in the directory. No re-explaining the project. The AI always knows the mission.
Working Memory
Session state "save game" that persists. AI knows where it left off. No context loss between sessions.
PHASE-GATES.md
Checkbox tracking with explicit criteria. AI knows what "done" means. No subjective judgment calls.
Validation Recipes
Automated PASS/FAIL verification. No "looks good" — objective gates. Trust is earned by evidence.