You just didn't know it yet.
“I need to build an eval system that measures whether team-knowledge actually helps people work.”
Clear goal. Specific scope. Before designing anything from scratch — search what the team already has.
You don't know what exists, so you build everything yourself. Weeks of work that might duplicate a teammate's.
“Hey, does anyone have anything for eval?” — Post in Teams. Hope the right person sees it before it scrolls away.
Book time with 3–4 people to compare approaches. Wait for calendars to align. Context-switch everyone involved.
Three targeted searches surfaced existing work from multiple teammates. The design conversation started with what already exists — not what to build from zero.
“How do I build an eval system?”
“Which of my teammates' work
do I compose?”
Rubric-based LLM scoring with exactly the right interface. Takes a session, applies scoring criteria, returns structured results. Built for a different project — but the scoring engine is general-purpose.
A two-phase prompt pattern: gather evidence first, then score. Prevents the LLM judge from deciding a verdict and backfilling justification. The recipe itself was wrong scope for eval — but the prompt architecture is gold.
Digital Twin Universe (DTU) — isolated containers that spin up a complete Amplifier environment for testing. The hardest infrastructure problem for eval — running truly isolated A/B sessions with no filesystem bleed-through — was already solved.
Looked relevant at first glance — “reality check” sounds like evaluation. After investigation: wrong use case entirely. It validates prompt outputs against known ground truth, not session-level quality.
An honest “no” is just as valuable. It saved days that would have been wasted trying to adapt an incompatible tool.
Run paired sessions — with KB vs. without — score on 5 dimensions, and produce a clear verdict.
MJ, Manoj, and David weren't context-switched. Their work was discovered and composed without interrupting anyone's day.
Three teammates' prior work was leveraged into a new design — no duplication, no wasted effort. When a team's knowledge is searchable, every investment in tooling pays dividends beyond the original project.
Build something good and your teammates will build on it — even when you don't know it's happening. You don't need to memorize everyone's repos. The knowledge base knows them for you.
The eval system design measures whether team knowledge changes how people work. In this session, team knowledge surfaced 3 teammates' work and turned a build-from-scratch design into a composition. The question was already answered before any code was written.
Data as of: April 2026
Feature status: Active
Basis: A real brainstorming session conducted using team-knowledge search capabilities.
Data sources:
Gaps: Exact capability count varies as teams publish new work. The 1,300+ figure is approximate, reflecting the index state during the session.
Search before you design from scratch.
The next breakthrough might already exist in a teammate's repo.
team_knowledge(operation="search", query="...")