The Seventh Layer
0
Four test runs.
Three topics.
Zero compliant outputs.
Not close. Not almost.  Zero.
Format Failure 7-Stage Pipeline Agent Architecture
02 — The Contract
Downstream specialists depended on a strict schema contract.
The Researcher produced beautiful, detailed research output. Downstream specialists couldn't parse any of it. No schema, no chain. Every run broke the pipeline.
✓  Expected schema
// confidence confidence: "high" ← ≥ 0.8 confidence: "medium" ← 0.5–0.79 confidence: "low" ← < 0.5 // tier tier: "primary" tier: "secondary" tier: "tertiary"
✗  What actually arrived — every run
// confidence confidence: 0.97 ← numeric, not categorical // tier tier: "T1" ← shorthand, not full word // plus: headers, tables, prose sections
03 — The Instinct
150 / 393
lines devoted to format enforcement — more than a third of the entire prompt.
Enforce harder.
Everyone who builds agents has had this instinct. The format failed — so add more format instructions.
The format still failed.
Every time.
Six enforcement layers
  • 1 OUTPUT CONTRACT block — written in capitals at the top of the prompt
  • 2 Stage 0 pre-commit — write the output format before any research begins
  • 3 Rigid Stage 7 template with exact field names and structure
  • 4 COMPLIANCE NOTE reminder embedded mid-prompt
  • 5 FINAL SELF-CHECK gate at the very end
  • 6 Annotated wrong/right examples — side by side, in the prompt
04 — The Format Spec Gets Buried
The Researcher's context window at synthesis time
Format
spec —
read here
10–20 web searches  ·  thousands of tokens of real-world evidence
Synthesize
→ Write
By synthesis time, the format spec is buried under everything that came after it.
05 — The Wrong Cognitive Mode
The model's cognitive flow through a research task
Gather evidence
Assess credibility
Identify patterns
Synthesize findings
Write narrative
Fill form cold switch — wrong mode
The cognitive mode always wins.
Every time. No matter how many enforcement layers you add.
Adding a seventh enforcement layer was never going to change this. The task itself was wrong for the stage.
06 — The Reframe
The model wasn't being disobedient.
It was doing exactly what it was built to do.

Research
Gather evidence. Assess credibility. Identify patterns. Build an argument. Produce narrative. This is what synthesis is — the cognitive mode is inseparable from the task.
Formatting
Receive content in any form. Apply deterministic transformation rules. Emit canonical schema. A completely different task — narrow, mechanical, exact.
These are not the same cognitive task. Treating them as one task — for one stage — is the architecture error. The prompt isn't the problem.
The pipeline design is the problem.
07 — The Fix
Strip all 150 lines. Add one stage.
Accept that the Researcher produces narrative. Stop treating that as a failure. Promote the Formatter to a mandatory stage — no bypass, no exceptions.
Researcher
free-form narrative — always
Formatter  new · mandatory
any input → canonical schema
Specialists
see clean structured data — always
Formatter — transformation rules
// Confidence: numeric → categorical 0.97"high" ← ≥ 0.8 0.62"medium" ← 0.5–0.79 0.34"low" ← < 0.5 // Tier: shorthand → full word "T1""primary" "T2""secondary" "T3""tertiary" // Sources: partial → validated URL name only"unknown" partialfull https:// or bust
Each specialist does exactly one thing, in the cognitive mode that thing actually requires.
08 — Validation
43
claims extracted
One run. Clean.
Researcher → Formatter → Writer
"What is Jina AI?"  ·  first run after architectural change
  • Every confidence value categorical — no numerics
  • Every tier label a full word — no shorthand
  • Every source a valid URL — no partials, no names
  • Writer produced a structured brief with 10 citations
  • Chain ran end-to-end without intervention
Prior architecture: 4 runs, 3 topics, 0 compliant outputs.
One architectural change. One run. Done.
09 — The Principle
The seventh layer was never
going to work.
The architecture was the
fix all along.
When a prompt-level instruction fails consistently — not occasionally, not in edge cases, but every single time — that pattern is a diagnostic signal: stop enforcing. Start routing.
Source: first-person account — all metrics (150/393 lines, 43 claims, 6 enforcement layers) are author-reported from direct observation. Validation run: "What is Jina AI?" via Researcher → Formatter → Writer chain.
More Amplifier Stories