Active

The 50% Rule

Building AI Validators That Actually Work

microsoft/amplifier-foundation

More Amplifier Stories

The Validation Paradox

AI can generate validation rules quickly. But who validates the validators?

📋
Manual Review Doesn't Scale
As the ecosystem grows, hand-checking every bundle configuration against every rule becomes unsustainable.
🤖
AI Rules Need Verification
LLM-generated rules look plausible. They read well. But plausibility is not correctness.
🔄
The Trust Gap
Teams need to trust validator output. If results are unreliable, that trust erodes quickly and permanently.

The Discovery

~50% Were Wrong

When initial validator rules were verified against actual code, roughly half needed correction.

"This is a rough observation from hands-on development — not a precise measurement. But the direction was unmistakable: the gap between 'sounds right' and 'is right' was far larger than expected."

Based on development experience building amplifier-foundation validators, Jan–Feb 2026

Fix #1

Verify Against Actual Code

The first rule of building validators: the codebase is the source of truth.

# Before: assume agents need descriptions rule: every agent must have a description field # After: verify against actual bundles $ grep -r "description:" bundles/*/agents/*.md # → Found agents without descriptions that work fine # → Rule was wrong — description is recommended, not required

The Second Trap

"Always Finds Something"

A validator that never returns a clean pass is not a validator — it's noise.

⚠ Anti-Pattern

Tuning validators to always produce findings feels productive. Every run generates a report. But teams learn to ignore the results.

When everything is flagged, nothing is actionable.

"The 'Always Finds Something' anti-pattern is explicitly documented in the DOMAIN_VALIDATOR_GUIDE.md as a known failure mode to avoid."

Source: amplifier-foundation/docs/DOMAIN_VALIDATOR_GUIDE.md (974 lines, per wc -l)

Fix #2

Explicit PASS Thresholds

Define clear, objective criteria for what constitutes a passing result.

Clear PASS Criteria
Every validator must define what "passing" looks like. No ambiguity. If the criteria are met, it passes — full stop.
🔒
Deterministic Classification
If a file meets objective, measurable criteria, it passes without LLM analysis. No AI second-guessing of clear results.
🎯
Actionable Output
When something fails, the output explains exactly what failed and why. Teams can act on the result immediately.

The Architecture

Deterministic First, AI Second

A two-layer approach that is faster, cheaper, and more predictable.

01
Deterministic Checks
File exists? Required fields present? Format valid? YAML parses? These have definitive answers — no LLM needed.
↓ Only items requiring judgment continue ↓
02
AI Analysis
Quality of descriptions, alignment with patterns, consistency with conventions. These require judgment — this is where LLMs add genuine value.
Why This Order Matters
Deterministic checks are instant, free, and reproducible. By filtering first, you reduce LLM calls, lower cost, increase speed, and make results auditable. The AI layer focuses only on what actually requires intelligence.

What Was Built

3 Recipes + 1 Guide

Concrete artifacts in microsoft/amplifier-foundation:

validate-bundle.yaml 276 lines · v2.0.0
validate-agents.yaml 1,074 lines
validate-bundle-repo.yaml 2,817 lines
DOMAIN_VALIDATOR_GUIDE.md 974 lines
4,167
lines across 3 validator recipes
find recipes/ -name "validate-*.yaml" | xargs wc -l
974
lines in the validator guide
wc -l docs/DOMAIN_VALIDATOR_GUIDE.md

Impact

Validators That Earn Trust

📖
Documented Anti-Patterns
The 974-line guide captures failure modes so future validator authors avoid repeating them.
wc -l docs/DOMAIN_VALIDATOR_GUIDE.md → 974
Reduced Noise
Explicit PASS thresholds mean validators produce clear signals — pass or fail with reasons, not endless advisory findings.
Pattern documented in DOMAIN_VALIDATOR_GUIDE.md
💰
Fewer Unnecessary LLM Calls
Deterministic-first architecture skips AI analysis for items with clear pass/fail criteria.
Architectural improvement; exact savings not measured

Note: Specific time or cost savings have not been formally measured. These are architectural improvements whose benefits are qualitative, based on development experience.

Velocity

Built in ~11 Days

18
validator-related commits
git log --oneline --all --grep="valid"
~11
days · Jan 28 – Feb 7, 2026
First and last validator commit dates via git log
1
repository
microsoft/amplifier-foundation
Contributor
Brian Krabach — 18 of 18 validator commits (100%). Primary author of all three validator recipes and the domain validator guide.
git log --format="%an" --grep="valid" | sort | uniq -c | sort -rn

Transparency

Sources & Methodology

Every claim in this deck is traceable to a specific command or stated as a qualitative observation.

Commands Run
  • git log --oneline --all --grep="valid" → 18 commits
  • git log --format="%an" --grep="valid" | sort | uniq -c
  • find recipes/ -name "validate-*.yaml" | xargs wc -l → 4,167
  • wc -l docs/DOMAIN_VALIDATOR_GUIDE.md → 974
  • git log --format="%ai" --grep="valid" | sort | head/tail
Data As Of
February 2026. Commit counts and line counts reflect the state of microsoft/amplifier-foundation at that date.
Known Gaps
The "50% Rule" is a qualitative observation from development experience, not a formally measured metric. No time or cost savings data was collected. Commit count uses --grep="valid" which may include tangentially related commits.

Feature Status: Active — validator recipes are in active use as of Feb 2026

Build Validators
That Actually Work

microsoft/amplifier-foundation

More Amplifier Stories