MADE Explorations · April 2026
write a program, not a prompt
MADE · All-hands
Based on Cloudflare Code Mode · kenotron-ms/amplifier-code-mode
Key point: This is the first announcement for the code-mode bundle. The shift is significant: instead of the LLM calling tools one at a time and accumulating results in context, it writes a Python program that runs all the tools — and only the program's output enters context.
If challenged: "Is this just another LLM wrapper?" No — Python executes in-process (no subprocess, no TCP bridge), every mounted Amplifier tool is available as an async function, and only print() output returns to the LLM.
Transition: Let's start with the problem it solves — most of us have hit it.
Confidence: High — implementation is live and committed. README has the full technical breakdown.
The Problem · LLM Orchestration
Every multi-step task today requires the LLM to call a tool, wait, receive the full result into context, then call the next tool. Each result stays in the context window — permanently.
Key point: The sequential model has three compounding costs: wall clock (each call waits), context (every result stays in the window), and non-determinism (the LLM decides whether to keep going). The 17-minute vault run is a real example from Ken's session.
If challenged: "Can't you just prompt the LLM to stop?" Sometimes. But a program terminates by construction — you don't need to prompt it. That's the structural difference.
Transition: The fix isn't to optimize the orchestration loop. It's to change what the LLM is doing entirely — from orchestrator to programmer.
Confidence: High — sequential model is how current Amplifier tool use works without code-mode. The 17-minute run is Ken's direct experience.
The Idea · LLM as Programmer
BEFORE: N round-trips · context bloat · serial execution · AFTER: 1 LLM call · asyncio.gather() · data stays out of context
Key point: The LLM is no longer the orchestrator that dispatches tools one at a time. It writes a program once, the Python runtime handles parallelism via asyncio.gather(), and only the program's print() output returns. The LLM never sees intermediate state.
If challenged: "Doesn't the LLM still have to understand all the results?" Yes — but it sees a filtered summary (what you print()), not all the raw data. It's the difference between a function's return value and a function's internal heap.
Transition: Brian Krabach had a sharp pushback when he saw this: "How is it different from parallel tool calls, which the orchestrator already supports?" Let's address that directly.
Confidence: High — the infographic is from the repo README. The before/after is an accurate representation of how code-mode works.
Brian's Question · vs Parallel Tool Calls
Parallel tool calls solve the latency problem for a single batch. code-mode solves: context pollution from intermediate results, LLM roundtrips between dependent stages, nondeterministic orchestrator behavior.
— Brian Krabach · MADE Explorations
Raw results flood context — code-mode filters in Python before returning
Parallel tool calls dump all N results verbatim into history. 20 files in parallel = 20 file contents permanently in context for every future call.
Multi-stage pipelines need LLM roundtrips — programs don't
Read 20 files → analyze → re-query 8 specific files. With the orchestrator, each stage needs an LLM call. In code-mode, one program handles all stages.
LLM orchestrators drift — programs terminate exactly as written
An LLM can decide to keep exploring if it finds something interesting. A program runs exactly as written, terminates on schedule, no 17-minute vault explorations.
Key point: Brian's question was exactly right. Parallel tool calls are a protocol-level optimization — the orchestrator still receives all N results. code-mode is a programmability optimization — results never leave Python unless you print them. These are fundamentally different layers.
If challenged: "Can't the orchestrator be designed to not dump all results into context?" Theoretically yes, but that requires custom orchestrator changes per tool. code-mode works with any tool combination, with any filtering logic you can write in Python, today.
Transition: Enough theory. Here's what this looks like in practice — the example Brian specifically called out as the clearest illustration of the value.
Confidence: High — Brian's exact phrasing is preserved from the Teams conversation. The 40→2 calculation: N tool_use + N tool_result vs 1 code + 1 result.
The Payload · The Aha Moment
Key point: This exact payload is what Brian called out. The three reads happen simultaneously. The LLM never sees the README, the pyproject.toml, or the raw git log — only a 10-line version summary. Filtering happens inside the program before any result is returned.
If challenged: "What if I want the full file contents back?" Just print() them. The filtering is optional and under your control. code-mode is a capability, not a constraint.
Transition: That's the concept. What we shipped today makes this programming model even cleaner to write.
Confidence: High — this exact code is directly runnable in any session with the bundle installed. Context math: 3 tool_use + 3 tool_result = 6 vs 1 code + 1 result = 2.
Shipped Today · Three Improvements
gather_limited() — controlled concurrency in one line
Replaces 8+ lines of asyncio.Semaphore setup. Pre-injected into every code-mode execution — no import needed. Backpressure starts the next coroutine as each slot frees, not all at once when a batch completes.
Unused imports removed before compile() — generated code just works
LLM-generated code often imports things it doesn't use. AST walk removes them in-process before compile() — no subprocess, no ruff, no temp files. ~40 lines, graceful fallback on syntax errors. 8 new tests.
8 pipeline patterns — injected into the tool description automatically
Sequential chains, fan-out/fan-in, DAG, conditional, pagination, error recovery, data transformation, controlled concurrency. The LLM sees them every time it invokes code-mode — no prompting required.
Key point: These three additions solve the three most common friction points in practice. gather_limited handles the "don't hammer an API" problem in one line. The AST cleanup means generated code doesn't fail on unused imports. The 8 patterns mean the LLM knows what's possible without being told.
If challenged: "Why not asyncio.Semaphore directly?" You can — but gather_limited handles backpressure correctly (starts next as each slot frees, not all at once at batch completion). That's the subtle difference that matters for rate-limited APIs.
Transition: All of this is in the repo and installable right now. Here's the command.
Confidence: High — all three features are committed. Commits: 58acd33, c4bb641, 7866f5c, ac7bb18, a7949ef.
Get It · One Command
After install, ask the LLM to write a pipeline for any multi-step task.
bash, read_file, web_fetch, LSP, delegate — whatever is mounted becomes await tool_name(...).
Key point: One command and it works with any existing Amplifier session. The LLM immediately has every session tool as an async function. The tool description regenerates dynamically every time with the current session's tool signatures — so it always knows exactly what's available.
If challenged: "Does this work with custom tools?" Yes — any tool mounted in the session gets its async stub injected. The dynamic description generation handles this automatically via modules/tool-code-mode/__init__.py.
Transition: That's code-mode. Write a program, not a prompt.
Confidence: High — install command is in the README, tested. Dynamic description generation is live in the current commit.