MADE Explorations · April 2026

code-mode

write a program, not a prompt

MADE · All-hands

Based on Cloudflare Code Mode · kenotron-ms/amplifier-code-mode

Key point: This is the first announcement for the code-mode bundle. The shift is significant: instead of the LLM calling tools one at a time and accumulating results in context, it writes a Python program that runs all the tools — and only the program's output enters context.

If challenged: "Is this just another LLM wrapper?" No — Python executes in-process (no subprocess, no TCP bridge), every mounted Amplifier tool is available as an async function, and only print() output returns to the LLM.

Transition: Let's start with the problem it solves — most of us have hit it.

Confidence: High — implementation is live and committed. README has the full technical breakdown.

The Problem · LLM Orchestration

Sequential LLM orchestration burns N round-trips, floods context with data nobody asked for, and sometimes never stops

Every multi-step task today requires the LLM to call a tool, wait, receive the full result into context, then call the next tool. Each result stays in the context window — permanently.

without code-mode — sequential execution

→ bash("git log --oneline -10") full output → context

→ read_file("README.md") full file → context

→ read_file("pyproject.toml") full file → context

→ synthesize(all three…) ← finally done

3 LLM round-trips · serial execution · raw data in context permanently
If the LLM finds something interesting, it might keep exploring — Ken had a run that read every .md in a vault for 17 minutes.

3×

wall clock vs parallel

17 min

vault run — LLM kept reading .md files

∞

intermediate data in context once it's there

Key point: The sequential model has three compounding costs: wall clock (each call waits), context (every result stays in the window), and non-determinism (the LLM decides whether to keep going). The 17-minute vault run is a real example from Ken's session.

If challenged: "Can't you just prompt the LLM to stop?" Sometimes. But a program terminates by construction — you don't need to prompt it. That's the structural difference.

Transition: The fix isn't to optimize the orchestration loop. It's to change what the LLM is doing entirely — from orchestrator to programmer.

Confidence: High — sequential model is how current Amplifier tool use works without code-mode. The 17-minute run is Ken's direct experience.

The Idea · LLM as Programmer

The LLM writes one Python program — tools run in parallel inside it, only the final print() returns to context

Before and after diagram: sequential LLM tool calls vs code-mode parallel execution with asyncio.gather

BEFORE: N round-trips · context bloat · serial execution · AFTER: 1 LLM call · asyncio.gather() · data stays out of context

Key point: The LLM is no longer the orchestrator that dispatches tools one at a time. It writes a program once, the Python runtime handles parallelism via asyncio.gather(), and only the program's print() output returns. The LLM never sees intermediate state.

If challenged: "Doesn't the LLM still have to understand all the results?" Yes — but it sees a filtered summary (what you print()), not all the raw data. It's the difference between a function's return value and a function's internal heap.

Transition: Brian Krabach had a sharp pushback when he saw this: "How is it different from parallel tool calls, which the orchestrator already supports?" Let's address that directly.

Confidence: High — the infographic is from the repo README. The before/after is an accurate representation of how code-mode works.

Brian's Question · vs Parallel Tool Calls

Parallel tool calls fix latency — code-mode fixes three things parallel calls can't touch

Parallel tool calls solve the latency problem for a single batch. code-mode solves: context pollution from intermediate results, LLM roundtrips between dependent stages, nondeterministic orchestrator behavior.

— Brian Krabach · MADE Explorations

01 · Context Pollution

Raw results flood context — code-mode filters in Python before returning

Parallel tool calls dump all N results verbatim into history. 20 files in parallel = 20 file contents permanently in context for every future call.

40 msgs → 2 msgs per batch of 20 parallel reads

02 · Dependent Stages

Multi-stage pipelines need LLM roundtrips — programs don't

Read 20 files → analyze → re-query 8 specific files. With the orchestrator, each stage needs an LLM call. In code-mode, one program handles all stages.

N calls → 0 calls LLM roundtrips between stages

03 · Determinism

LLM orchestrators drift — programs terminate exactly as written

An LLM can decide to keep exploring if it finds something interesting. A program runs exactly as written, terminates on schedule, no 17-minute vault explorations.

nondeterministic vs terminates orchestrator vs program

Key point: Brian's question was exactly right. Parallel tool calls are a protocol-level optimization — the orchestrator still receives all N results. code-mode is a programmability optimization — results never leave Python unless you print them. These are fundamentally different layers.

If challenged: "Can't the orchestrator be designed to not dump all results into context?" Theoretically yes, but that requires custom orchestrator changes per tool. code-mode works with any tool combination, with any filtering logic you can write in Python, today.

Transition: Enough theory. Here's what this looks like in practice — the example Brian specifically called out as the clearest illustration of the value.

Confidence: High — Brian's exact phrasing is preserved from the Teams conversation. The 40→2 calculation: N tool_use + N tool_result vs 1 code + 1 result.

The Payload · The Aha Moment

Three parallel reads, one filtered summary — the program decides what context ever sees

pipeline.py — tool_code_mode

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

git_log, readme, package = await asyncio.gather( bash(command="git log --oneline -10"), read_file(file_path="README.md"), read_file(file_path="pyproject.toml"), ) commits = git_log['stdout'].strip().splitlines() version_match = re.search(r'version = "(.+?)"', package['content']) version = version_match.group(1) if version_match else "unknown" print(f"Version: {version}") print(f"Last {len(commits)} commits:") for c in commits: print(f" {c}") # Only this filtered summary enters LLM context

What this program does

✓ 3 tool calls run in parallel via asyncio.gather()

✓ Python filters: extract version, count commits

✓ Only the print() summary returns to context

✕ 3 raw file contents never enter context

Context cost

Without code-mode 6messages

With code-mode 2messages

Runtime

⚡ In-process — no subprocess, no TCP, no temp files

⚡ Each await tool_name() is a direct method call

Key point: This exact payload is what Brian called out. The three reads happen simultaneously. The LLM never sees the README, the pyproject.toml, or the raw git log — only a 10-line version summary. Filtering happens inside the program before any result is returned.

If challenged: "What if I want the full file contents back?" Just print() them. The filtering is optional and under your control. code-mode is a capability, not a constraint.

Transition: That's the concept. What we shipped today makes this programming model even cleaner to write.

Confidence: High — this exact code is directly runnable in any session with the bundle installed. Context math: 3 tool_use + 3 tool_result = 6 vs 1 code + 1 result = 2.

Shipped Today · Three Improvements

Three additions make the programming model cleaner out of the box — no boilerplate, no import errors, no guessing at patterns

New · Pre-injected

gather_limited() — controlled concurrency in one line

Replaces 8+ lines of asyncio.Semaphore setup. Pre-injected into every code-mode execution — no import needed. Backpressure starts the next coroutine as each slot frees, not all at once when a batch completes.

results = await gather_limited([tool(x) for x in items], limit=20)

New · Auto-cleanup

Unused imports removed before compile() — generated code just works

LLM-generated code often imports things it doesn't use. AST walk removes them in-process before compile() — no subprocess, no ruff, no temp files. ~40 lines, graceful fallback on syntax errors. 8 new tests.

import json # generated but unused → removed silently

New · Documented

8 pipeline patterns — injected into the tool description automatically

Sequential chains, fan-out/fan-in, DAG, conditional, pagination, error recovery, data transformation, controlled concurrency. The LLM sees them every time it invokes code-mode — no prompting required.

sequential · fan-out · dag · conditional · paginate · recover · transform · concurrency

Key point: These three additions solve the three most common friction points in practice. gather_limited handles the "don't hammer an API" problem in one line. The AST cleanup means generated code doesn't fail on unused imports. The 8 patterns mean the LLM knows what's possible without being told.

If challenged: "Why not asyncio.Semaphore directly?" You can — but gather_limited handles backpressure correctly (starts next as each slot frees, not all at once at batch completion). That's the subtle difference that matters for rate-limited APIs.

Transition: All of this is in the repo and installable right now. Here's the command.

Confidence: High — all three features are committed. Commits: 58acd33, c4bb641, 7866f5c, ac7bb18, a7949ef.

Get It · One Command

Add it to any Amplifier session — every tool you have becomes an async function the LLM can call from a program

terminal

$ amplifier bundle add git+https://github.com/kenotron-ms/amplifier-code-mode@main --app

Repo github.com/kenotron-ms/amplifier-code-mode

Commits 58acd33 · c4bb641 · 7866f5c · ac7bb18

After install, ask the LLM to write a pipeline for any multi-step task.
bash, read_file, web_fetch, LSP, delegate — whatever is mounted becomes await tool_name(...).

Key point: One command and it works with any existing Amplifier session. The LLM immediately has every session tool as an async function. The tool description regenerates dynamically every time with the current session's tool signatures — so it always knows exactly what's available.

If challenged: "Does this work with custom tools?" Yes — any tool mounted in the session gets its async stub injected. The dynamic description generation handles this automatically via modules/tool-code-mode/__init__.py.

Transition: That's code-mode. Write a program, not a prompt.

Confidence: High — install command is in the README, tested. Dynamic description generation is live in the current commit.