Cost Optimization

Three Ways Amplifier
Controls AI Costs

Prompt caching, per-step model selection, and loop architecture.
Real infrastructure with measurable impact.

Feature Status: Active
February 2026
The Bottom Line

Three optimizations, compounding returns

Prompt Caching
90% savings
~90% savings on cached input tokens. Automatic, zero config.
Per Anthropic published pricing
Model Routing
12x cheaper
Haiku vs Sonnet per Anthropic pricing. Per-step selection in recipes.
Used in code-review-recipe.yaml today
Loop Architecture
~10x projected
Cheap models in loops, smart models for synthesis. Design Pattern
Optimization 1 — Prompt Caching

System prompts cached server-side by Anthropic

Provider-anthropic automatically adds cache_control markers. No configuration required.

Turn 1: 43,433 tokens (caching...)
Turn 2: 43,891 tokens (92% cached)
Turn 3: 44,205 tokens (95% cached)
Turn 4: 44,672 tokens (98% cached)
Anthropic pricing (Sonnet): Cached input = $0.30/M tokens. Fresh input = $3/M tokens. That's 10x.
Pricing as of early 2026. Check docs.anthropic.com for current rates.
Optimization 1 — Prompt Caching

Three benefits, not just cost

Cost: ~90% reduction
Long conversations with 40k+ token context become economically viable. Savings based on Anthropic's 10x cached-vs-fresh pricing ratio.
Speed: faster TTFT
Time-to-first-token drops on cache hits. Per Anthropic documentation, cached prompts see significantly reduced latency.
Capacity: rate limit relief
Cached tokens count less against TPM limits. More parallel agent work within the same rate budget.
Implementation: provider-anthropic _format_system_with_cache(), _apply_message_cache_control(), _apply_tool_cache_control()
Core: cache_read_tokens and cache_write_tokens fields in TokenUsage model
Optimization 2 — Per-Step Model Selection

Recipe steps specify their own model

Haiku
Date parsing, classification, simple extraction, formatting
~12x cheaper
Sonnet
Code analysis, synthesis, report writing, exploration
Baseline
Opus
Architecture design, security review, deep reasoning
~5x more
12x between Haiku and Sonnet. ~60x between Haiku and Opus.
Per Anthropic published API pricing. Glob patterns (claude-sonnet-*) auto-resolve to latest version.
Optimization 2 — Per-Step Model Selection

Real usage: code-review-recipe.yaml

# From amplifier-bundle-recipes/examples/code-review-recipe.yaml steps: - id: "assess-severity" model: "claude-haiku" # ~12x cheaper - simple classification prompt: "Classify this change as low/medium/high severity" - id: "deep-review" provider: "anthropic" model: "claude-sonnet-*" # needs reasoning for analysis prompt: "Review this code for issues..." - id: "format-summary" model: "claude-haiku" # simple formatting, no reasoning prompt: "Summarize the review findings"
Schema: provider, model, and provider_preferences (with ordered fallbacks) per step
Source: amplifier-bundle-recipes RECIPE_SCHEMA.md, code-review-recipe.yaml
Optimization 3 — Loop Architecture Design Pattern

Cheap models in loops, smart models for synthesis

Parse date
Haiku
Chunk 1
Haiku
Chunk 2
Haiku
...×100
Haiku
Synthesize
Sonnet
Sonnet for everything (projected)
~$50
100 chunks × $3/M each at Sonnet pricing
Haiku loops + Sonnet synthesis (projected)
~$5
100 × $0.25/M + 1 × $3/M at current pricing
Infrastructure: foreach loops (with parallel: N) + per-step model are both active in recipe schema.
Cost projection based on Anthropic API pricing applied to foreach pattern. No production recipe uses this exact pattern yet.
Bonus — Token-Safe Git Status

Prevent accidental token DoS

The Problem
Large repos with many uncommitted changes can produce 100k+ token git status output.

Status context hook injects this every turn. One bad repo = context overflow.
The Fix: Tiered Filtering
Tier 1: Always-ignore patterns (build artifacts, caches)
Tier 2: Limit to N files (default 10) per pattern
Tier 3: All other files shown up to hard limit

Hard limits: max 20 untracked, 100 total lines, 50 tracked files
Implementation: hooks-status-context with test_token_safety.py test suite
Core: injection_budget_per_turn in coordinator provides additional budget guardrail

Summary: Three optimizations, compounding returns

Optimization
Savings
Status
Effort to use
Prompt Caching
~90%
Active
None (automatic)
Model Selection
12–60x per step
Active
Add model: per recipe step
Loop Architecture
~10x at scale
Projected
foreach + model routing pattern
Combined potential: A workflow that costs ~$50 could cost ~$5. Prompt caching is automatic and active today. Model routing is proven in code-review-recipe. Loop architecture awaits adoption.

Development velocity

Infrastructure shipped across the Amplifier ecosystem

4
Repositories
~18
Cost-related commits (core)
3
Kernel features added
1
Recipe using Haiku today
Repositories: microsoft/amplifier-core, microsoft/amplifier-module-provider-anthropic,
microsoft/amplifier-module-hooks-status-context, microsoft/amplifier-bundle-recipes

Primary contributors: Brian Krabach (core ~97%, hooks-status-context, recipes), Salil Das (provider-anthropic)
Sources

Research Methodology

Data as of: February 20, 2026

Feature status: Active (prompt caching, model selection, token safety); Design Pattern (loop architecture cost projection)

Research performed:

Pricing source: Anthropic published API pricing. "~12x cheaper" = Haiku vs Sonnet input token ratio. "90% savings" = 10x cached-vs-fresh pricing.

Gaps: Exact PR count and development timeline not independently verified from cached repos (local cache shows release commits only). The "$50→$5" projection is mathematical, not measured from a production run. TTFT improvement attributed to Anthropic benchmarks, not measured by Amplifier.

Get Started

Try it yourself

Prompt caching works automatically with provider-anthropic.
Model routing takes one line per recipe step.

# Add to any recipe step: model: "claude-haiku" # for simple tasks model: "claude-sonnet-*" # for reasoning tasks model: "claude-opus-*" # for deep analysis
github.com/microsoft/amplifier-bundle-recipes — Recipe schema and examples
github.com/microsoft/amplifier-module-provider-anthropic — Caching implementation
github.com/microsoft/amplifier-core — Kernel cost infrastructure
More Amplifier Stories