Prompt engineering is dying. Prompt architecting isn't.

The job of crafting clever prompts to coax better answers from a frontier model is mostly over. The job of designing how prompts compose into systems is just beginning.

Sid Smith

28 Apr 2025 • 5 min read

The genre of writing where someone discovers that adding "Let's think step by step" to a prompt makes the model better at math has aged out. Modern frontier models don't need the magic-words tricks the way GPT-3.5 did, and the residual cases where prompt fiddling yields meaningful gains are narrowing fast. The "prompt engineer" job title that emerged around 2023 is on the way out in the form it had then.

What's replacing it is a different discipline that's getting called by the same name and shouldn't be. The work of designing how prompts compose into systems, how context is assembled, how budgets are allocated, how roles are scoped, how outputs are validated, is real, distinct, and growing. Calling that "prompt engineering" buries what's actually changed.

Side-by-side comparison of the 2023 prompt-engineering job (clever phrasing for one model) versus the 2025 prompt-architecture job (designing how prompts compose into systems with context, budgets, tools, and validation).

What "prompt engineering" used to mean

Through 2023 and into 2024, prompt engineering was a fairly concrete craft. Given a model that responded unevenly to phrasing, the work was finding the right phrasing. Few-shot examples that matched the output format you wanted. Role-playing instructions ("you are an expert X"). Chain-of-thought prompts. Output formatting tricks ("respond in JSON with the following schema"). Adversarial prompts to expose model behavior. Each of these was a real lever, and the difference between a well-prompted model and a naively-prompted one was often dramatic.

The work was bounded enough to be a job. People got hired with "prompt engineer" in the title. There were prompt-engineering courses, prompt-engineering newsletters, prompt-engineering competitions. A whole light cottage industry formed around the gap between what models could do and what they did with the prompts most users were giving them.

That gap has closed.

What changed

A few things in combination collapsed the prompt-engineering surface area:

Models got better at instruction-following. GPT-4o, Claude 3.5 Sonnet, the Gemini 2 line, all of them respond well to plain instructions. The difference between "you are a senior software engineer reviewing this code" and just "review this code" is small to none on a frontier model in 2025. The carefully-crafted role-play prompts that made GPT-3.5 perform stopped mattering.

Reasoning models internalize the chain-of-thought. When extended thinking became a default behavior, the user-side trick of "let's think step by step" stopped being necessary. The model thinks. You don't have to ask it to. That collapses a meaningful fraction of the prompt-engineering tricks of the prior era into a feature flag.

Structured output got first-class support. OpenAI shipped JSON-mode and function-calling well over a year ago; everyone else has followed. The "respond in JSON with this schema" prompt-engineering trick is now an API parameter. You don't engineer the prompt to coerce structure; you tell the API what shape you want and the model produces it.

Tool use replaced a category of prompt-engineering altogether. A lot of the cleverness used to go into prompts that worked around the model's limitations, "if you don't know, say so"; "look up the current date before answering"; "show your work." Tools handle most of these directly now. The model calls a tool to get the current date instead of guessing. The model calls a tool to verify before asserting. The cleverness moved from the prompt to the tool surface.

The sum of these is that the surface where careful prompting used to win meaningful capability has shrunk to a small specialist tier. There's still real work in prompting frontier models for very particular workloads. There's a much smaller market for it than there was two years ago.

What's growing instead

The work that's growing (and is being mislabeled as the same job) is meaningfully different. It's about how prompts compose into systems.

A few specific shapes of that work, in increasing order of how much architecture they actually require:

Prompt patterns as reusable artifacts. Not a single prompt for a single problem; a library of system prompts scoped to roles, with versioning, with tests, with metrics on how each one performs across the workload that uses it. This is the treat-the-AI-like-an-employee discipline at the artifact level, the role gets defined, written down, version-controlled.

Context assembly. Deciding what to put in front of the model on a given turn. Which documents to retrieve. Which user history to include. Which prior assistant messages to keep, which to summarize, which to drop. This is mostly an engineering problem with a prompt-shaped output, and the context-window math constraint forces you to be deliberate about it.

Budget allocation. When and where to spend reasoning tokens, when to spend context, when to spend tool calls. A pipeline that runs many model calls per user request is a budget-allocation problem dressed up as a prompt problem. The system prompts are tiny artifacts; the architecture decisions about which model handles which step, with which budget, are the actual work.

Tool surface design. What tools the model has access to, how they're named, what they return, how their results compose. The agent's behavior depends as much on the tools available as on the prompts. Designing the tool surface is the part of "prompt architecture" that actually moves the needle.

Output validation and recovery. What happens when the model returns something that doesn't fit the expected shape. Retries with adjusted prompts. Fallback to a different model. Graceful degradation to a human. None of this is prompt engineering in the old sense; all of it is part of a prompted-system architecture.

The rough estimate, from looking at the AI workloads I've shipped or reviewed in the last year: maybe 5% of the engineering work is the prompt itself. The other 95% is the architecture around the prompts.

Why the language matters

If we keep calling the new work "prompt engineering," the people doing it inherit the credibility deficit the original term has earned. "Prompt engineer" became a punchline around the same time it became a job title, and not entirely undeservedly, a lot of what was published under the label was tricks that worked on one model and broke on the next.

The architectural work is more durable. Context-assembly patterns, budget-allocation strategies, tool-surface designs, validation pipelines, these aren't tied to a specific model's quirks. They survive model upgrades. They generalize across vendors. They're closer to systems engineering than to prompt magic. Calling them by a different name (prompt architecture, prompted-system design, AI workflow engineering, whatever) would help the field stop conflating two different things and start hiring for the right one.

The old prompt engineering job is dying because the gap it filled is closing. The new architectural job is growing because the systems being built around prompts are getting more complex faster than the models themselves are. Worth being clear about which one you mean when you use the word.

What "prompt engineering" used to mean

What changed

What's growing instead

Why the language matters

Subscribe to Echoes of the machine