Tool Module · nano-banana

Your AI has been
flying blind.

You describe images with words. Your AI generates code it cannot see. The gap between mockup and implementation is pure guesswork.

analyze compare generate

February 2026

The Problem

Text is a lossy codec
for visual information.

When you hand an AI a design mockup, everything that matters — spacing, hierarchy, weight, color, proportion — survives only as approximation in your description.

🔤

"It should look like the mockup"

The most common instruction in UI work. Also the most ambiguous. Font weights, spacing rhythm, and shadow depth don't survive this translation.

🎯

Precision lost at every step

Describe → implement → screenshot → describe again. Each round trip through language bleeds fidelity. Small deviations compound silently.

🔁

The cycle never closes

Without a way to actually see both images and name every difference, iteration is infinite. You're polishing by feel, not by fact.

The Problem, Continued

Generation without
sight is decoration.

Image generation tools can make things. But they can't analyze what they made. They can't compare it to what you wanted. They can't close the loop.

Tools that only generate

You prompt. You get an image. You look at it. You describe what's wrong. You prompt again. Your AI still hasn't seen either image.

The missing half

Visual intelligence means reading images with precision, comparing them with rigor, and generating with real visual context — not just words about images.

Three operations. One tool. The complete visual loop.

The Solution

Give your AI
three new senses.

nano-banana is a lean Amplifier tool module that wraps Gemini's vision capabilities into three precise, composable operations.

3 operations

~150 lines of core code

1 tool, zero drift

Operation 1

analyze Give your AI eyes.

Point at any image. Get a precise, structured description of exactly what is there — components, typography, spacing, hierarchy, color. No approximation. No verbal relay.

# Identify every UI component with semantic names
nano-banana operation="analyze"
            image_path="design/dashboard-mockup.png"
            prompt="List all UI components with semantic names
                   and exact layout positions. Note font
                   weights, spacing units, and color values."

Typography forensics

Font families, weights, sizes, line heights — extracted and named. "Inter 600 at 28px with 1.2 line-height" not "a large heading."

Component inventory

ArticleCard, not just Card. PrimaryActionButton, not just button. Semantic names your code can match.

Operation 2

compare See every difference.

Two images in. A precise diff out. Every spacing deviation, color drift, weight mismatch, and missing element — named and located. This is the operation that closes the loop.

# Validate implementation against mockup
nano-banana operation="compare"
            image1_path="design/mockup.png"
            image2_path="screenshots/current.png"
            image1_label="DESIGN TARGET"
            image2_label="CURRENT IMPLEMENTATION"
            prompt="Identify every visual discrepancy.
                   Be specific: exact element, what differs,
                   and estimated magnitude of the difference."

Used for mockup vs implementation, before vs after, font option A vs B — any two-image comparison where precision beats approximation.

Operation 3

generate Create with context.

Text-to-image generation that understands what you've already built. Pass a reference image and the generator doesn't start from nothing — it starts from your visual language.

# Generate with reference image to guide style
nano-banana operation="generate"
            output_path="assets/hero-illustration.png"
            reference_image_path="assets/existing-style.png"
            prompt="A dashboard overview screen in the same
                   visual style. Warm tones, clean sans-serif,
                   data cards with subtle shadows."
            number_of_images=3

Reference-guided generation

Your existing visual language shapes the output. Style, composition, and tone carry forward — not just the prompt.

Up to 4 variants per call

Generate multiple options in one shot. Compare them with analyze. Pick, refine, iterate — all inside your Amplifier session.

The Complete Loop

The visual feedback
loop, finally closed.

Three operations compose into a tight, repeatable workflow. Each step is precise. Nothing is lost in verbal translation.

analyze mockup

→

implement

→

screenshot

→

compare

→

fix precisely

→

done ✓

Before: verbal approximation

"The button looks slightly too big and the font might be wrong." Four iterations to converge. Maybe.

After: named differences

"Button padding is 12px, target is 8px. Font weight is 400, should be 600." One iteration. Done.

Impact

Precision has
a compounding effect.

Every time a comparison replaces a description, every time a reference image guides generation, every time analysis names a component — iteration cycles collapse.

🎯

First-pass accuracy

Semantic component names from analyze feed directly into implementation prompts. The AI builds what was described, not what was interpreted.

⚡

Collapse the diff cycle

compare turns a 5-round visual review into one targeted fix list. Differences are named, not guessed. You implement once.

🔗

Style stays coherent

Reference-guided generate keeps new assets inside your existing visual language. No style drift across an evolving project.

Get Started

Add vision to
your Amplifier session.

One bundle include. One GOOGLE_API_KEY. Three operations that change how you build with AI.

# .amplifier/bundle.md
includes:
  - bundle: git+https://github.com/microsoft/amplifier-module-tool-nano-banana@main

View on GitHub Get Gemini API Key

Sources & Methodology

Research trail.

Repository: microsoft/amplifier-module-tool-nano-banana — primary source for all capability descriptions, code line counts, and feature status.
Code analysis: amplifier_module_tool_nano_banana/tool.py — tool schema, operation implementations, model binding (gemini-3-pro-image-preview), reference_image_path parameter, hook events.
Git log: 5 commits reviewed — image generation added, VLM model made configurable, reference image support added, story bundle composed in. All dates and feature attributions verified against commit messages.
Usage examples: USAGE_EXAMPLES.md and tests/ — prompt patterns, operation contract, session transcript sample.
Feature status: Active — main branch, clean working tree as of February 27, 2026.
Core size: ~150 lines verified by reading tool.py and __init__.py directly. No inflation.

Story authored February 27, 2026 · amplifier-bundle-stories archetype: Problem-Solution-Impact

Your AI has beenflying blind.

Text is a lossy codecfor visual information.