Active

The 50% Rule

Building AI Validators That Actually Work

microsoft/amplifier-foundation

The Validation Paradox

AI can generate validation rules quickly. But who validates the validators?

📋

Manual Review Doesn't Scale

As the ecosystem grows, hand-checking every bundle configuration against every rule becomes unsustainable.

🤖

AI Rules Need Verification

LLM-generated rules look plausible. They read well. But plausibility is not correctness.

🔄

The Trust Gap

Teams need to trust validator output. If results are unreliable, that trust erodes quickly and permanently.

The Discovery

~50% Were Wrong

When initial validator rules were verified against actual code, roughly half needed correction.

"This is a rough observation from hands-on development — not a precise measurement. But the direction was unmistakable: the gap between 'sounds right' and 'is right' was far larger than expected."

Based on development experience building amplifier-foundation validators, Jan–Feb 2026

Fix #1

Verify Against Actual Code

The first rule of building validators: the codebase is the source of truth.

Every rule must be tested against real files in the target repository
Don't assume a field is required — grep for it, parse it, confirm it
If a rule can't be verified programmatically, question whether it belongs
The code is the ground truth, not assumptions about what "should" be there

# Before: assume agents need descriptions
rule: every agent must have a description field

# After: verify against actual bundles
$ grep -r "description:" bundles/*/agents/*.md
# → Found agents without descriptions that work fine
# → Rule was wrong — description is recommended, not required

The Second Trap

"Always Finds Something"

A validator that never returns a clean pass is not a validator — it's noise.

⚠ Anti-Pattern

Tuning validators to always produce findings feels productive. Every run generates a report. But teams learn to ignore the results.

When everything is flagged, nothing is actionable.

"The 'Always Finds Something' anti-pattern is explicitly documented in the DOMAIN_VALIDATOR_GUIDE.md as a known failure mode to avoid."

Source: amplifier-foundation/docs/DOMAIN_VALIDATOR_GUIDE.md (974 lines, per wc -l)

Fix #2

Explicit PASS Thresholds

Define clear, objective criteria for what constitutes a passing result.

✅

Clear PASS Criteria

Every validator must define what "passing" looks like. No ambiguity. If the criteria are met, it passes — full stop.

🔒

Deterministic Classification

If a file meets objective, measurable criteria, it passes without LLM analysis. No AI second-guessing of clear results.

🎯

Actionable Output

When something fails, the output explains exactly what failed and why. Teams can act on the result immediately.

The Architecture

Deterministic First, AI Second

A two-layer approach that is faster, cheaper, and more predictable.

01

Deterministic Checks

File exists? Required fields present? Format valid? YAML parses? These have definitive answers — no LLM needed.

↓ Only items requiring judgment continue ↓

02

AI Analysis

Quality of descriptions, alignment with patterns, consistency with conventions. These require judgment — this is where LLMs add genuine value.

Why This Order Matters

Deterministic checks are instant, free, and reproducible. By filtering first, you reduce LLM calls, lower cost, increase speed, and make results auditable. The AI layer focuses only on what actually requires intelligence.

What Was Built

3 Recipes + 1 Guide

Concrete artifacts in microsoft/amplifier-foundation:

validate-bundle.yaml 276 lines · v2.0.0

validate-agents.yaml 1,074 lines

validate-bundle-repo.yaml 2,817 lines

DOMAIN_VALIDATOR_GUIDE.md 974 lines

4,167

lines across 3 validator recipes

find recipes/ -name "validate-*.yaml" | xargs wc -l

974

lines in the validator guide

wc -l docs/DOMAIN_VALIDATOR_GUIDE.md

Impact

Validators That Earn Trust

📖

Documented Anti-Patterns

The 974-line guide captures failure modes so future validator authors avoid repeating them.

wc -l docs/DOMAIN_VALIDATOR_GUIDE.md → 974

⚡

Reduced Noise

Explicit PASS thresholds mean validators produce clear signals — pass or fail with reasons, not endless advisory findings.

Pattern documented in DOMAIN_VALIDATOR_GUIDE.md

💰

Fewer Unnecessary LLM Calls

Deterministic-first architecture skips AI analysis for items with clear pass/fail criteria.

Architectural improvement; exact savings not measured

Note: Specific time or cost savings have not been formally measured. These are architectural improvements whose benefits are qualitative, based on development experience.

Velocity

Built in ~11 Days

18

validator-related commits

git log --oneline --all --grep="valid"

~11

days · Jan 28 – Feb 7, 2026

First and last validator commit dates via git log

1

repository

microsoft/amplifier-foundation

Contributor

Brian Krabach — 18 of 18 validator commits (100%). Primary author of all three validator recipes and the domain validator guide.

git log --format="%an" --grep="valid" | sort | uniq -c | sort -rn

Transparency

Sources & Methodology

Every claim in this deck is traceable to a specific command or stated as a qualitative observation.

Commands Run

git log --oneline --all --grep="valid" → 18 commits
git log --format="%an" --grep="valid" | sort | uniq -c
find recipes/ -name "validate-*.yaml" | xargs wc -l → 4,167
wc -l docs/DOMAIN_VALIDATOR_GUIDE.md → 974
git log --format="%ai" --grep="valid" | sort | head/tail

Data As Of

February 2026. Commit counts and line counts reflect the state of microsoft/amplifier-foundation at that date.

Known Gaps

The "50% Rule" is a qualitative observation from development experience, not a formally measured metric. No time or cost savings data was collected. Commit count uses --grep="valid" which may include tangentially related commits.

Feature Status: Active — validator recipes are in active use as of Feb 2026

Build Validators
That Actually Work

Read the guide → docs/DOMAIN_VALIDATOR_GUIDE.md

Use the validators → recipes/validate-*.yaml

Follow the pattern → deterministic first, AI second

microsoft/amplifier-foundation

The 50% Rule

The Validation Paradox

~50% Were Wrong

Verify Against Actual Code

"Always Finds Something"

Explicit PASS Thresholds

Deterministic First, AI Second

3 Recipes + 1 Guide

Validators That Earn Trust

Built in ~11 Days

Sources & Methodology

Build ValidatorsThat Actually Work

Build Validators
That Actually Work