Tool Module · nano-banana

Your AI has been
flying blind.

You describe images with words. Your AI generates code it cannot see. The gap between mockup and implementation is pure guesswork.

analyze compare generate

February 2026

The Problem

Text is a lossy codec
for visual information.

When you hand an AI a design mockup, everything that matters — spacing, hierarchy, weight, color, proportion — survives only as approximation in your description.

🔤

"It should look like the mockup"

The most common instruction in UI work. Also the most ambiguous. Font weights, spacing rhythm, and shadow depth don't survive this translation.

🎯

Precision lost at every step

Describe → implement → screenshot → describe again. Each round trip through language bleeds fidelity. Small deviations compound silently.

🔁

The cycle never closes

Without a way to actually see both images and name every difference, iteration is infinite. You're polishing by feel, not by fact.

The Problem, Continued

Generation without
sight is decoration.

Image generation tools can make things. But they can't analyze what they made. They can't compare it to what you wanted. They can't close the loop.

Tools that only generate

You prompt. You get an image. You look at it. You describe what's wrong. You prompt again. Your AI still hasn't seen either image.

The missing half

Visual intelligence means reading images with precision, comparing them with rigor, and generating with real visual context — not just words about images.

Three operations. One tool. The complete visual loop.

The Solution

Give your AI
three new senses.

nano-banana is a lean Amplifier tool module that wraps Gemini's vision capabilities into three precise, composable operations.

3 operations
~150 lines of core code
1 tool, zero drift
Operation 1

analyze Give your AI eyes.

Point at any image. Get a precise, structured description of exactly what is there — components, typography, spacing, hierarchy, color. No approximation. No verbal relay.

# Identify every UI component with semantic names nano-banana operation="analyze" image_path="design/dashboard-mockup.png" prompt="List all UI components with semantic names and exact layout positions. Note font weights, spacing units, and color values."

Typography forensics

Font families, weights, sizes, line heights — extracted and named. "Inter 600 at 28px with 1.2 line-height" not "a large heading."

Component inventory

ArticleCard, not just Card. PrimaryActionButton, not just button. Semantic names your code can match.

Operation 2

compare See every difference.

Two images in. A precise diff out. Every spacing deviation, color drift, weight mismatch, and missing element — named and located. This is the operation that closes the loop.

# Validate implementation against mockup nano-banana operation="compare" image1_path="design/mockup.png" image2_path="screenshots/current.png" image1_label="DESIGN TARGET" image2_label="CURRENT IMPLEMENTATION" prompt="Identify every visual discrepancy. Be specific: exact element, what differs, and estimated magnitude of the difference."

Used for mockup vs implementation, before vs after, font option A vs B — any two-image comparison where precision beats approximation.

Operation 3

generate Create with context.

Text-to-image generation that understands what you've already built. Pass a reference image and the generator doesn't start from nothing — it starts from your visual language.

# Generate with reference image to guide style nano-banana operation="generate" output_path="assets/hero-illustration.png" reference_image_path="assets/existing-style.png" prompt="A dashboard overview screen in the same visual style. Warm tones, clean sans-serif, data cards with subtle shadows." number_of_images=3

Reference-guided generation

Your existing visual language shapes the output. Style, composition, and tone carry forward — not just the prompt.

Up to 4 variants per call

Generate multiple options in one shot. Compare them with analyze. Pick, refine, iterate — all inside your Amplifier session.

The Complete Loop

The visual feedback
loop, finally closed.

Three operations compose into a tight, repeatable workflow. Each step is precise. Nothing is lost in verbal translation.

analyze mockup
implement
screenshot
compare
fix precisely
done ✓

Before: verbal approximation

"The button looks slightly too big and the font might be wrong." Four iterations to converge. Maybe.

After: named differences

"Button padding is 12px, target is 8px. Font weight is 400, should be 600." One iteration. Done.

Impact

Precision has
a compounding effect.

Every time a comparison replaces a description, every time a reference image guides generation, every time analysis names a component — iteration cycles collapse.

🎯

First-pass accuracy

Semantic component names from analyze feed directly into implementation prompts. The AI builds what was described, not what was interpreted.

Collapse the diff cycle

compare turns a 5-round visual review into one targeted fix list. Differences are named, not guessed. You implement once.

🔗

Style stays coherent

Reference-guided generate keeps new assets inside your existing visual language. No style drift across an evolving project.

Get Started

Add vision to
your Amplifier session.

One bundle include. One GOOGLE_API_KEY. Three operations that change how you build with AI.

# .amplifier/bundle.md includes: - bundle: git+https://github.com/microsoft/amplifier-module-tool-nano-banana@main
View on GitHub Get Gemini API Key
Sources & Methodology

Research trail.

Story authored February 27, 2026 · amplifier-bundle-stories archetype: Problem-Solution-Impact

More Amplifier Stories