A Real Story

Your Teammates Already
Solved Half Your Problem

You just didn't know it yet.

Team Knowledge — April 2026
The Starting Point

“I need to build an eval system that measures whether team-knowledge actually helps people work.”

Clear goal. Specific scope. Before designing anything from scratch — search what the team already has.

What Usually Happens

The default workflow is expensive

Design from scratch

You don't know what exists, so you build everything yourself. Weeks of work that might duplicate a teammate's.

Ask around

“Hey, does anyone have anything for eval?” — Post in Teams. Hope the right person sees it before it scrolls away.

Schedule a sync

Book time with 3–4 people to compare approaches. Wait for calendars to align. Context-switch everyone involved.

What Happened Instead

Search first. Design second.

1,300+
capabilities indexed
across the team
16
teammates' work
searchable
5
eval-related tools
surfaced

Three targeted searches surfaced existing work from multiple teammates. The design conversation started with what already exists — not what to build from zero.

The Turning Point

“How do I build an eval system?”

“Which of my teammates' work
do I compose?”

Discovery 1
MJ
sage-eval

Rubric-based LLM scoring with exactly the right interface. Takes a session, applies scoring criteria, returns structured results. Built for a different project — but the scoring engine is general-purpose.

✓ Use directly
Discovery 2
Manoj
session-drift-report

A two-phase prompt pattern: gather evidence first, then score. Prevents the LLM judge from deciding a verdict and backfilling justification. The recipe itself was wrong scope for eval — but the prompt architecture is gold.

✓ Steal the pattern
Discovery 3
David
amplifier-tester

Digital Twin Universe (DTU) — isolated containers that spin up a complete Amplifier environment for testing. The hardest infrastructure problem for eval — running truly isolated A/B sessions with no filesystem bleed-through — was already solved.

✓ Use directly
Discovery 4
David
reality-check-pipeline

Looked relevant at first glance — “reality check” sounds like evaluation. After investigation: wrong use case entirely. It validates prompt outputs against known ground truth, not session-level quality.

✗ Don't use

An honest “no” is just as valuable. It saved days that would have been wasted trying to adapt an incompatible tool.

The Result

Three teammates' work. One eval system.

MJ
Scoring engine
Grades each session on 5 dimensions
Manoj
Prompt pattern
Evidence first, prevents rationalization
David
Isolation infrastructure
Clean A/B environments via DTU

The Eval System

Run paired sessions — with KB vs. without — score on 5 dimensions, and produce a clear verdict.

KB_HELPED KB_NEUTRAL KB_HURT
The Real Metric
0
meetings
scheduled
0
Teams threads asking
“does anyone have…”
0
interruptions to
contributors

MJ, Manoj, and David weren't context-switched. Their work was discovered and composed without interrupting anyone's day.

Why This Matters

Two audiences. One insight.

Work compounds instead of being reinvented.

Three teammates' prior work was leveraged into a new design — no duplication, no wasted effort. When a team's knowledge is searchable, every investment in tooling pays dividends beyond the original project.

Your work gets found.

Build something good and your teammates will build on it — even when you don't know it's happening. You don't need to memorize everyone's repos. The knowledge base knows them for you.

The Recursive Proof

This brainstorming session proved
the very thing it set out to measure.

The eval system design measures whether team knowledge changes how people work. In this session, team knowledge surfaced 3 teammates' work and turned a build-from-scratch design into a composition. The question was already answered before any code was written.

Sources & Methodology

How we got these numbers

Data as of: April 2026

Feature status: Active

Basis: A real brainstorming session conducted using team-knowledge search capabilities.

Data sources:

Gaps: Exact capability count varies as teams publish new work. The 1,300+ figure is approximate, reflecting the index state during the session.

Get Started

Build on what your
teammates built.

Search before you design from scratch.
The next breakthrough might already exist in a teammate's repo.

team_knowledge(operation="search", query="...")

MADE team: microsoft/amplifier-bundle-team-knowledge-base

More Amplifier Stories