Parallax Discovery

Multi-Agent Deep Investigation for Complex Systems

Not a quick answer. Not a plausible narrative. True understanding—with proof.

● Active v0.1.0 March 2026

Single-perspective analysis misses things. Every time.

Incomplete by default

You send an agent to investigate a complex system. It comes back with findings that seem reasonable—but are incomplete. You send a second agent. It finds different things. The intersection is never the full picture.

Docs ≠ reality

Documentation describes the intended design. Code reveals the actual design. They diverge. Assumptions compound silently—each agent inherits the last one's blind spots.

Confident & wrong

LLM sessions produce plausible, confident narratives that can be confidently wrong. Without adversarial verification, you can't tell the difference between insight and illusion.

We learned this the hard way

We ported patterns from a successful autonomous coding system—the dev-machine bundle, which had built a 166K-line TypeScript word processor across 692 features and 255 sessions. The port looked right on paper.

Before

Review loop couldn't converge—oscillated endlessly, finding new P0 issues on every pass.

Required human intervention every time.

→

Investigation

Single-agent investigation produced analysis that seemed reasonable but was incomplete. Couldn't tell us why the convergence failed.

"Send multiple agents independently. Use their disagreements as signal. Then challenge everything with execution."

The Parallax Insight

In astronomy, parallax determines a star's true position by observing it from multiple vantage points. The apparent shift between observations reveals the actual distance.

Agreements confirm

When independent agents arrive at the same finding from different angles, confidence is high. This is corroboration through isolation.

Disagreements reveal

When agents find different things, that's not a problem to resolve—it's a priority to investigate. Discrepancies are signal, not failure.

"Where there are discrepancies, send an independent party of investigative agents to go trace actual code… in fact, send multiple independent parties for the same value of not counting on/trusting just one."

The Three-Wave Funnel

Each wave is more targeted and more rigorous than the last, informed by the discrepancies from the wave before it.

Wave 1: Discovery Broad, triplicate. 3 independent agents per topic. Fresh context. Cast a wide net. Output: 75+ artifact files, reconciliation with discrepancies.

▲ Approval Gate — human reviews & steers Wave 2

Wave 2: Verification Focused, expert. Targeted code-level deep dives on specific discrepancies. 1 expert per discrepancy. File:line evidence required.

▲ Approval Gate — human reviews & steers Wave 3

Wave 3: Adversarial Execution-based. Challenge claims with actual code execution. Write test scripts. Run them. Verdicts: CONFIRMED / REFUTED / CORRECTED.

▲ Approval Gate — human reviews

Wave 4: Synthesis Final deliverables. Unified understanding, architecture DOT diagram, evidence catalog, open questions, actionable plan.

The Triplicate Pattern

Every topic gets 3 independent agents with context_depth="none"—genuinely fresh context.

Code Tracer HOW

Traces actual code execution paths. File:line citations. LSP navigation.

Unique find: The admissions-advisor agent is NOT invoked by the admissions mode—they're parallel paths through 7 files.

Behavior Observer WHAT

Examines 10+ real artifacts. Quantifies patterns. Reveals reality vs. documentation.

Unique find: Word4 diverged 11.4x from its template. CONTEXT-TRANSFER.md grew 391x (28→10,967 lines).

Integration Mapper WHERE/WHY

Maps cross-boundary integrations. Finds what happens where neither mechanism's docs describe the interaction.

Unique find: A confirmed deep_merge bug silently dropping config during bundle composition. Two live collision sites.

Every team in every investigation had at least one agent discover something the others missed. The integration mapper consistently produced the most architecturally significant insights.

The Evidence Hierarchy

Not all evidence is equal. The methodology enforces a strict quality hierarchy.

1 Execution evidence Ran code, observed behavior, captured output. Gold standard. Antagonist agent operates here.

2 Code tracing File:line citations, LSP navigation, followed logic. Code Tracer & Integration Mapper operate here.

3 Documentation Read docs, README, comments. Medium reliability. Behavior Observer spans here.

4 Assumption Inferred from patterns. Must flag explicitly. All agents fall here sometimes.

Code reading identifies mechanisms. Execution proves impact.

What's In the Bundle

5

Agents

code-tracer, behavior-observer, integration-mapper, lead-investigator, antagonist

4

Modes

/discovery → /verification → /adversarial → /synthesis

2

Recipes

Staged outer orchestration + per-wave agent dispatch

1

Skill

parallax-methodology: progressive-reveal (~2K tokens on demand)

4

Context Files

methodology, triplicate-pattern, artifact-strategy, wave-protocol

Built on real Amplifier mechanisms—validated by first investigating those mechanisms with 18 agents: Skills for progressive-reveal JIT context · Modes for phase enforcement via hooks · Recipes for staged orchestration with approval gates · Agents with context_depth="none" for cognitive isolation · Bundles via the thin bundle pattern. Validated end-to-end through the recipe engine.

Proof: Dev-Machine Investigation

Understand how the dev-machine bundle truly works so we could port its patterns correctly.

22

Agent investigations

97

Artifact files

22,433

Lines of content

3

Waves

The Convergence Layering Principle

"The termination condition of an iterative loop must be more deterministic than the loop body."

The dev-machine reviewed one feature's diff against one feature's spec, gated by deterministic exit codes. Our resolver reviewed the entire codebase with probabilistic termination. Six boundary failures confirmed it.

Produced a deployable fix—shipped as PR #45. Briefs now converge autonomously in 7–27 minutes.

Proof: Amplifier-Resolve Investigation

Understand how amplifier-resolve truly, currently works—not the docs, the reality.

24

Agent investigations

110

Artifact files

18

Verified bugs

13

Fixes shipped

3 CRITICAL 7 HIGH 6 MEDIUM 2 LOW

Critical bugs found

Consensus pipeline silently running 1-of-3 models. DotGraph silencing 23 pipeline engine events. Resume creating an infinite escalation loop.

What it shipped

~50 lines fixed 3 CRITICAL + 2 HIGH bugs. ~700 lines of verified dead code removed. Test suite: 850 → 1,145 (34% growth, zero failures).

The methodology catches its own errors

Wave 3 doesn't just find bugs in the target—it corrects mistakes from Waves 1 and 2.

Corrected

Wave 1 said "15 events." Wave 3 proved it's 23 constants, 7 unique types.

A 53% undercount that only execution-based measurement could reveal.

Corrected

Wave 1 said "7 dead code items safe to remove." Wave 3 proved only 3 are truly dead.

4 items flagged as dead code are live production code.

Prevented

The stealth production break that Wave 3 caught

The context= parameter was flagged for removal. Removing it passes the full test suite. But it breaks 8 production callers at runtime—callers not covered by any test. Wave 3's execution-based verification caught this. Code reading alone cannot.

Corrected

Wave 1 estimated token budget at ~23–26K. Wave 3 measured ~36–39K.

A 50% correction. The difference between "probably fine" and "needs architectural attention."

By the Numbers

76+

Agent investigations
across 13 waves

313+

Artifact files
produced

72K+

Lines of investigation
content

18

Verified bugs found
3 CRITICAL · 7 HIGH

850→1,145

Test suite growth
34% · zero failures

7–27m

Resolver convergence
was: human every time

Designed, built, validated, and used to ship real production fixes—all in a single session.

Sources & Methodology

Data Source	Method	Evidence Type
Dev-machine investigation (22 agents)	Parallax Discovery 3-wave funnel against amplifier-bundle-dev-machine + word4 repos	97 artifact files, 22,433 lines, reconciled across triplicate teams
Amplifier-resolve investigation (24 agents)	Parallax Discovery 3-wave funnel against amplifier-resolve source	110 artifact files, 26,691 lines, 18 bugs adversarially verified
Amplifier mechanisms study (18 agents)	Triplicate teams across skills, recipes, bundles, agents, modes, hooks	80+ artifact files, ~15,000 lines, 5 discrepancies resolved
Recipe validation run	Full 4-wave recipe execution via Amplifier recipe engine	26 artifact files, 2 bugs caught and fixed live
E2E verification	Real brief submissions through the running resolver service	PR #45, PR #2, PR #3 — live convergence in 7.5, 26.9, 21.3 min
Test suite	`uv run pytest tests/ -q` on amplifier-resolve main	1,145 passed, 0 failures

Data as of: March 10, 2026 · Session: b174ff67-755d-43c3-a2b3-fe6acb4e7eb0 · All metrics from actual git commits, agent outputs, and test runs—not estimates.

Get Started

Interactive (human-in-the-loop)

amplifier bundle add github:bkrabach/amplifier-bundle-parallax-discovery load_skill(skill_name="parallax-methodology") /discovery

Load the skill, enter discovery mode, and follow the guidance. Define your topics, dispatch agents, and steer through approval gates.

Recipe (automated with approval gates)

amplifier run "execute @parallax-discovery:recipes/parallax-discovery.yaml \
  with target='my-system' \
  topics='api, auth, storage'"

The recipe automates the multi-wave flow. Approve between waves to steer the investigation.

Repository

github.com/bkrabach/amplifier-bundle-parallax-discovery

MODULES.md PR

microsoft/amplifier#232

When to use

Complex systems · Outdated docs · Architectural decisions needing ground truth · Previous investigations that were confident-but-wrong

Methodology skill

load_skill("parallax-methodology")