You describe images with words. Your AI generates code it cannot see. The gap between mockup and implementation is pure guesswork.
February 2026
When you hand an AI a design mockup, everything that matters — spacing, hierarchy, weight, color, proportion — survives only as approximation in your description.
The most common instruction in UI work. Also the most ambiguous. Font weights, spacing rhythm, and shadow depth don't survive this translation.
Describe → implement → screenshot → describe again. Each round trip through language bleeds fidelity. Small deviations compound silently.
Without a way to actually see both images and name every difference, iteration is infinite. You're polishing by feel, not by fact.
Image generation tools can make things. But they can't analyze what they made. They can't compare it to what you wanted. They can't close the loop.
You prompt. You get an image. You look at it. You describe what's wrong. You prompt again. Your AI still hasn't seen either image.
Visual intelligence means reading images with precision, comparing them with rigor, and generating with real visual context — not just words about images.
Three operations. One tool. The complete visual loop.
nano-banana is a lean Amplifier tool module that wraps Gemini's vision capabilities into three precise, composable operations.
Point at any image. Get a precise, structured description of exactly what is there — components, typography, spacing, hierarchy, color. No approximation. No verbal relay.
Font families, weights, sizes, line heights — extracted and named. "Inter 600 at 28px with 1.2 line-height" not "a large heading."
ArticleCard, not just Card.
PrimaryActionButton, not just button.
Semantic names your code can match.
Two images in. A precise diff out. Every spacing deviation, color drift, weight mismatch, and missing element — named and located. This is the operation that closes the loop.
Used for mockup vs implementation, before vs after, font option A vs B — any two-image comparison where precision beats approximation.
Text-to-image generation that understands what you've already built. Pass a reference image and the generator doesn't start from nothing — it starts from your visual language.
Your existing visual language shapes the output. Style, composition, and tone carry forward — not just the prompt.
Generate multiple options in one shot. Compare them with
analyze. Pick, refine, iterate — all inside
your Amplifier session.
Three operations compose into a tight, repeatable workflow. Each step is precise. Nothing is lost in verbal translation.
"The button looks slightly too big and the font might be wrong." Four iterations to converge. Maybe.
"Button padding is 12px, target is 8px. Font weight is 400, should be 600." One iteration. Done.
Every time a comparison replaces a description, every time a reference image guides generation, every time analysis names a component — iteration cycles collapse.
Semantic component names from analyze feed directly
into implementation prompts. The AI builds what was described,
not what was interpreted.
compare turns a 5-round visual review into
one targeted fix list. Differences are named, not guessed.
You implement once.
Reference-guided generate keeps new assets
inside your existing visual language. No style drift
across an evolving project.
One bundle include. One GOOGLE_API_KEY.
Three operations that change how you build with AI.
microsoft/amplifier-module-tool-nano-banana — primary source
for all capability descriptions, code line counts, and feature status.
amplifier_module_tool_nano_banana/tool.py — tool schema,
operation implementations, model binding (gemini-3-pro-image-preview),
reference_image_path parameter, hook events.
USAGE_EXAMPLES.md and tests/ — prompt patterns,
operation contract, session transcript sample.
tool.py and
__init__.py directly. No inflation.
Story authored February 27, 2026 · amplifier-bundle-stories archetype: Problem-Solution-Impact