DSPy + MCP: the combination I've been waiting for

DSPy optimizes the prompt-and-program layer. MCP standardizes the tool-and-data layer. Put them together and you have the primitives for maintainable agents, the stack I've been arguing for since the start of the year, finally usable.

DSPy + MCP: the combination I've been waiting for

I've been arguing this combination, in pieces, for the better part of a year. It's worth pulling it together into one statement now, because the pieces have finally caught up to each other in a way that lets me say it cleanly: DSPy plus MCP is the agent stack I've been waiting for.

Two layers. One handles the prompts and the program; the other handles the tools and the data. Both are open. Both are composable. Both treat the LLM as a component you reason about, not a black box you incant at.

That's the headline. Now the actual argument.

The two problems agents actually have

If you've built agents in production, you already know the two recurring pain points. They sit at different layers of the stack and they have different shapes.

The first is the prompt-and-program layer. You have an LLM call (or several), wrapped in some glue, doing some task. The prompt drifts. Small changes break behavior in non-obvious ways. You can't tell whether a regression came from the model, the prompt, the data, or the upstream code. The whole thing is a soup of strings and you're stirring it with hope.

The second is the tool-and-data layer. Your agent needs to read something, call something, write something, talk to a database, hit an API, look in a vector store, manipulate a file. Every integration is bespoke. Every team builds the same five connectors slightly differently. The agent's reach into the world is hand-wired, brittle, and not portable across hosts.

For most of 2024 and 2025 these two problems were treated as one big "agents are hard" problem, and people threw all sorts of things at them, agent frameworks that tried to solve both, monoliths that tried to abstract both, vendor stacks that hid both behind a UI. None of it stuck for me. The frameworks that solved one well usually made the other worse.

The reason I've landed where I have is that these are different problems and they need different primitives.

What DSPy is solving

DSPy is a framework for writing LLM-driven programs as programs (typed signatures, modules, declarative pipelines) and then optimizing the prompts that drive them automatically. You define what the program is supposed to do; the framework figures out the prompts that get the model to do it, against examples and a metric you specify.

The thing that makes DSPy matter, for me, is not the syntax. It's the shift in posture. You stop hand-tuning prompts and start optimizing them the way you'd optimize any other piece of a system, with a data set, a metric, and a search procedure. The prompt becomes a compiled artifact rather than a magic string. The program is the thing you maintain.

I wrote about this in the case for local DSPy, specifically about running the optimization against your own examples without shipping them out, and again in DSPy in real life, where I tried to be honest about the rough edges. The rough edges are real. The posture is the right posture anyway.

What DSPy doesn't solve, and was never trying to solve, is the tool-and-data layer. Where your agent gets its inputs from. What APIs it can reach. How that reach is described to it. Where the data it acts on actually lives.

That's the gap MCP fills.

What MCP is solving

MCP is a protocol for exposing tools, resources, and prompts to language models in a standard way. A server speaks MCP; a host that speaks MCP can use the server's capabilities without bespoke glue. The agent doesn't care whether the database is Postgres or Snowflake; it sees an MCP server with a query tool and a schema. The agent doesn't care whether the file store is S3 or local disk; it sees an MCP server with a read_file tool.

The win is not just convenience. The win is composability. Once tools live behind a standard interface, you can swap them, version them, scope them, audit them, and reason about them as a layer. You can write an agent against a set of capabilities rather than against a particular vendor's SDK. You can give the agent a smaller set of tools at runtime than the host knows about, without recompiling anything.

I made the broader case in why MCP, everything else for tool integration in 2026, and the practical case in building agents inside an MCP-only architecture. Short version: the connector sprawl was never going to be solved by another framework. It was always going to be solved by a protocol. MCP is the protocol that won, and it won fast enough that 2026 is the year I stopped writing bespoke tool integrations.

What MCP doesn't solve, and was never trying to solve, is the prompt-and-program layer. The agent still has to decide what to do with the tools. The prompts that drive that deciding still have to come from somewhere. The program that orchestrates the calls still has to be written, tested, and maintained.

That's the gap DSPy fills.

Why the combination is the thing

This is the part I've been waiting to say without qualifying it.

DSPy and MCP are not competitors. They are not alternatives. They sit at different layers and they do not overlap. DSPy is about how the model reasons; MCP is about what the model can reach. You need both, and you need them clean.

In a DSPy-plus-MCP stack:

  • The agent's behavior is a DSPy program, typed, declarative, optimizable against your data and metrics.
  • The agent's capabilities are MCP servers, standardized, swappable, scopeable per environment.
  • The two layers talk to each other through a small, boring interface. The DSPy program calls tools; the tools come from the MCP host the program is running inside.

What you get out of this, that you don't get from any single-layer solution:

You can change the model without rewriting the tool layer. You can change the tools without retraining the prompts. You can optimize a prompt against a new metric without touching anything below it. You can add a new MCP server and have it show up to existing programs without code changes in the programs themselves. The layers are genuinely separable.

That separability is what "maintainable agents" actually means. Not "elegant code." Not "clean architecture." The ability, in a year, when one of the layers needs to change, to change only that layer.

I keep coming back to an analogy from my own career. Infrastructure became operable when configuration and compute were separated, when you stopped editing servers by hand and started declaring what they should look like, with the orchestration handling the rest. Agents are at the same turning point. DSPy is the declarative layer for the reasoning; MCP is the declarative layer for the capabilities. Once both are in place, the work shifts from gluing strings to defining systems.

The honest qualifications

A few things to be straight about, because the version of this argument without qualifications would be a cheerleader version and I'm trying not to write that.

DSPy is still rough in places. The optimizer is sensitive to the metric. The compiled programs are not always portable across model versions without recompilation. The learning curve is steeper than it should be if you've spent the last two years writing prompts by hand. The community is small enough that a lot of best practices are not yet written down. I think all of these get better; they are not solved today.

MCP is broadly deployed at this point, but the tools around it still have gaps. Server quality varies. The auth and scoping story is better than it was a year ago, but still has rough edges around fine-grained permission. Long-running tools are awkward. The MCP-only architecture works; it just costs you some patience around the edges.

Neither of these qualifications changes the position. They are the qualifications you'd expect from any stack that just hit "actually usable." The point is that the stack is now actually usable.

What I'm building on top of it

Concretely. Here's what I now do, that I wasn't doing this time last year.

I write the agent's program in DSPy. I expose the agent's tools through MCP servers, some I run, some that already exist for the systems I care about. I keep a small evaluation set for each program with a metric I trust, and I recompile the prompts when the model or the metric changes. I scope the MCP servers per environment so the production agent and the dev agent can't see the same things. I treat the prompts as build artifacts, not source.

That's it. That's the whole stack. It is dramatically less code than the equivalent setup eighteen months ago, and dramatically more legible.

The cleanest test of whether a stack is right is whether you stop thinking about it. I've stopped thinking about how the agent talks to tools, that's MCP's problem now. I've stopped thinking about how the prompts get tuned, that's DSPy's problem now. What I think about instead is the thing I should have been thinking about all along: what the agent is supposed to do, how I'd know if it did it well, and what data I'd use to tell.

That's the combination I was waiting for. It's here. The argument is over; the building is the point.