Agent design patterns: what's actually working

Two years into agents-everywhere, a small set of design patterns has separated from the noise. Worth being explicit about which shapes are actually paying back in production and which are still mostly aspirational.

Sid Smith

26 May 2025 • 5 min read

The agent-everywhere pitch from Build and I/O the past two weeks is the loud version of a longer-running quieter trend. The quieter version is that the practitioners actually shipping production agentic systems have converged on a small set of design patterns that work, and a similarly small set of anti-patterns that look promising in demos and fail predictably in real workloads.

Worth being explicit about both lists, because the keynote framing isn’t going to get specific and the practitioner conversation has moved well past where the marketing is.

agent-design-patterns

The patterns that are actually paying back

Five shapes have separated from the noise in the last six months:

Planner-executor split. The single most important architectural choice in any non-trivial agentic system. One model (usually a slower, more expensive one) produces a plan. A second cheaper model executes the plan step by step. The split lets you spend reasoning-tier money on the part of the workflow where reasoning quality matters and workhorse-tier money on the rest. The naive shape (one model that plans and executes) burns reasoning tokens on every turn and costs 3-5× what the split costs at equivalent quality.

Tool-scoped subagents. Instead of giving one agent access to twenty tools, give a parent agent access to five subagents, each with access to the four tools relevant to their domain. Reduces the parent’s tool-selection failure rate (fewer choices per decision), reduces context overhead (the parent doesn’t see the subagent’s tool outputs unless they bubble up), and makes failure modes more isolated. The pattern adds latency on the routing hop and is almost always worth it.

Retry-with-reflection. When a tool call fails or produces an unexpected result, don’t just retry the same prompt. Have the agent reflect on what went wrong, adjust the plan, and try a different approach. The naive retry pattern just hits the same failure mode again; the reflection pattern recovers gracefully more often. Costs more per retry; eliminates the infinite-loop failure mode that plain retry frequently produces.

The human-in-the-loop checkpoint. For any action with non-trivial blast radius, file deletion, external API call with cost or commitment, anything writing to a system of record, the agent presents the action and waits for explicit confirmation rather than executing autonomously. The pattern shows up everywhere governance matters, which is everywhere. Adds friction; eliminates a class of failure mode that pure autonomy doesn’t.

Bounded autonomy with rollback. When the agent does have execution authority, it operates within a bounded sandbox (git branch, copy of the data, isolated filesystem) and the user can roll back to the pre-agent state if the result is wrong. The pattern lets you keep the productivity benefit of autonomous execution while limiting the downside. Most production-grade agentic IDEs implement some version of this; the ones that don’t are the ones that produce horror stories.

The anti-patterns that keep appearing

Three shapes that look reasonable in demos and fail in production:

The single mega-prompt. One enormous system prompt that tries to encode all the context, all the rules, all the tools, all the personas. Doesn’t survive contact with edge cases because the model can’t hold the structure under any but the simplest queries. The pattern persists because demo workloads happen to be the simple ones; real workloads break it within a week of launch.

Unbounded recursion / agent-spawns-agent. An agent that can spawn child agents to handle subtasks, which can spawn their own children, and so on. Looks elegant in a diagram. In practice tends to either explode in cost or get stuck in loops because the depth limits aren’t well-thought-out. The constrained version (planner-executor with a fixed-depth subagent layer) works; the unconstrained version doesn’t.

Tool-and-pray. Connecting the agent to every available tool and trusting it to figure out which one to use. Sometimes works; often doesn’t; failure modes are unpredictable. The tool-scoped subagent pattern is what this anti-pattern matures into when teams figure out the failures.

The five patterns that work share a common structure: each of them limits the agent’s degrees of freedom in a way that trades flexibility for reliability. The mega-prompt is maximally flexible and minimally reliable; the planner-executor split is less flexible and much more reliable. The unbounded recursion gives the agent maximum freedom and predictably explodes; the bounded recursion gives it less freedom and works.

The pattern of patterns: production agentic systems are about deliberately limiting what the agent can do, not maximally enabling it. The shop building the most capable agentic surface isn’t the one that gives the agent the most tools and the most autonomy; it’s the one that gives it the right tools, the right autonomy, with the right checkpoints. The discipline is the product.

This is the same lesson the prompt-architecting framing pointed at, scaled up to the agent layer: the right thinking is structural rather than tactical. The shape of the system matters more than the cleverness of any individual prompt.

The governance overlay nobody loves but everyone needs

Underneath the design patterns sits the governance question that the keynotes are still ducking. Even with the right patterns, an agent in production needs to answer: who can deploy it, what’s it allowed to do, who audits its actions, what gets revoked when the use case changes? The patterns don’t answer those, they make the system more reliable, not more accountable. Both are needed.

Deployments doing this well have a governance layer wrapped around the agentic patterns: a deployment registry, action-level audit trails, scope policies, lifecycle management. The vendor-provided versions of all of these are still thin. Most of the production-grade governance visible in public reporting is custom-built on top of the platform, the way the treat-the-AI-like-an-employee discipline suggested it would have to be.

For a team building their first non-trivial agentic system in the second half of 2025:

Start with planner-executor. Don’t try to build the unified agent. The split makes everything downstream easier and is the cheapest pattern to get right.
Define tool scopes deliberately. The agent’s tool surface should be the smallest one that can do the job, not the largest one your platform exposes.
Add the human-in-the-loop checkpoint for anything with side effects. Friction now is cheaper than incident response later.
Build the audit log before you need it. Retrofitting audit onto a deployed agent is much harder than adding it from day one.
Set bounded autonomy as the default, with explicit per-action escalations to less-bounded modes when the user wants them.

None of these are exotic. The reason they’re worth listing is that the marketing layer of the agent conversation tends to skip past them in favor of “look, the agent did the thing autonomously.” The patterns that actually scale start from a more constrained place and earn the autonomy back as the system proves itself. That’s the version of the agent story that’s actually shipping.

The patterns that are actually paying back

The anti-patterns that keep appearing

What the patterns share

The governance overlay nobody loves but everyone needs

What I’d recommend if you’re starting today

Subscribe to Echoes of the machine