Multi-Agent Deep Investigation for Complex Systems
Not a quick answer. Not a plausible narrative. True understanding—with proof.
You send an agent to investigate a complex system. It comes back with findings that seem reasonable—but are incomplete. You send a second agent. It finds different things. The intersection is never the full picture.
Documentation describes the intended design. Code reveals the actual design. They diverge. Assumptions compound silently—each agent inherits the last one's blind spots.
LLM sessions produce plausible, confident narratives that can be confidently wrong. Without adversarial verification, you can't tell the difference between insight and illusion.
We ported patterns from a successful autonomous coding system—the dev-machine bundle, which had built a 166K-line TypeScript word processor across 692 features and 255 sessions. The port looked right on paper.
Review loop couldn't converge—oscillated endlessly, finding new P0 issues on every pass.
Required human intervention every time.
Single-agent investigation produced analysis that seemed reasonable but was incomplete. Couldn't tell us why the convergence failed.
In astronomy, parallax determines a star's true position by observing it from multiple vantage points. The apparent shift between observations reveals the actual distance.
When independent agents arrive at the same finding from different angles, confidence is high. This is corroboration through isolation.
When agents find different things, that's not a problem to resolve—it's a priority to investigate. Discrepancies are signal, not failure.
Each wave is more targeted and more rigorous than the last, informed by the discrepancies from the wave before it.
Every topic gets 3 independent agents with context_depth="none"—genuinely fresh context.
Traces actual code execution paths. File:line citations. LSP navigation.
Unique find: The admissions-advisor agent is NOT invoked by the admissions mode—they're parallel paths through 7 files.
Examines 10+ real artifacts. Quantifies patterns. Reveals reality vs. documentation.
Unique find: Word4 diverged 11.4x from its template. CONTEXT-TRANSFER.md grew 391x (28→10,967 lines).
Maps cross-boundary integrations. Finds what happens where neither mechanism's docs describe the interaction.
Unique find: A confirmed deep_merge bug silently dropping config during bundle composition. Two live collision sites.
Not all evidence is equal. The methodology enforces a strict quality hierarchy.
Code reading identifies mechanisms. Execution proves impact.
Built on real Amplifier mechanisms—validated by first investigating those mechanisms with 18 agents:
Skills for progressive-reveal JIT context ·
Modes for phase enforcement via hooks ·
Recipes for staged orchestration with approval gates ·
Agents with context_depth="none" for cognitive isolation ·
Bundles via the thin bundle pattern.
Validated end-to-end through the recipe engine.
Understand how the dev-machine bundle truly works so we could port its patterns correctly.
"The termination condition of an iterative loop must be more deterministic than the loop body."
The dev-machine reviewed one feature's diff against one feature's spec, gated by deterministic exit codes. Our resolver reviewed the entire codebase with probabilistic termination. Six boundary failures confirmed it.
Produced a deployable fix—shipped as PR #45. Briefs now converge autonomously in 7–27 minutes.
Understand how amplifier-resolve truly, currently works—not the docs, the reality.
Consensus pipeline silently running 1-of-3 models. DotGraph silencing 23 pipeline engine events. Resume creating an infinite escalation loop.
~50 lines fixed 3 CRITICAL + 2 HIGH bugs. ~700 lines of verified dead code removed. Test suite: 850 → 1,145 (34% growth, zero failures).
Wave 3 doesn't just find bugs in the target—it corrects mistakes from Waves 1 and 2.
A 53% undercount that only execution-based measurement could reveal.
4 items flagged as dead code are live production code.
The context= parameter was flagged for removal. Removing it passes the full test suite. But it breaks 8 production callers at runtime—callers not covered by any test. Wave 3's execution-based verification caught this. Code reading alone cannot.
A 50% correction. The difference between "probably fine" and "needs architectural attention."
Designed, built, validated, and used to ship real production fixes—all in a single session.
| Data Source | Method | Evidence Type |
|---|---|---|
| Dev-machine investigation (22 agents) | Parallax Discovery 3-wave funnel against amplifier-bundle-dev-machine + word4 repos | 97 artifact files, 22,433 lines, reconciled across triplicate teams |
| Amplifier-resolve investigation (24 agents) | Parallax Discovery 3-wave funnel against amplifier-resolve source | 110 artifact files, 26,691 lines, 18 bugs adversarially verified |
| Amplifier mechanisms study (18 agents) | Triplicate teams across skills, recipes, bundles, agents, modes, hooks | 80+ artifact files, ~15,000 lines, 5 discrepancies resolved |
| Recipe validation run | Full 4-wave recipe execution via Amplifier recipe engine | 26 artifact files, 2 bugs caught and fixed live |
| E2E verification | Real brief submissions through the running resolver service | PR #45, PR #2, PR #3 — live convergence in 7.5, 26.9, 21.3 min |
| Test suite | uv run pytest tests/ -q on amplifier-resolve main | 1,145 passed, 0 failures |
Data as of: March 10, 2026 · Session: b174ff67-755d-43c3-a2b3-fe6acb4e7eb0 · All metrics from actual git commits, agent outputs, and test runs—not estimates.
amplifier bundle add github:bkrabach/amplifier-bundle-parallax-discovery
load_skill(skill_name="parallax-methodology")
/discovery
Load the skill, enter discovery mode, and follow the guidance. Define your topics, dispatch agents, and steer through approval gates.
amplifier run "execute @parallax-discovery:recipes/parallax-discovery.yaml \
with target='my-system' \
topics='api, auth, storage'"
The recipe automates the multi-wave flow. Approve between waves to steer the investigation.
github.com/bkrabach/amplifier-bundle-parallax-discovery
microsoft/amplifier#232
Complex systems · Outdated docs · Architectural decisions needing ground truth · Previous investigations that were confident-but-wrong
load_skill("parallax-methodology")