Why "fully autonomous" is the wrong target

The discourse keeps pushing toward agents that operate alone. The discipline pushes back: bounded autonomy is the right target. Full autonomy is a category error, it's never what production wants. The right target is 'as much rope as you've earned, no more.'

Sid Smith

02 Dec 2025 • 7 min read

[SERIES DRAFT] Most of the agent discourse I read in 2025 frames autonomy as a slope you climb. You start with copilots that suggest. You graduate to agents that act with confirmation. Eventually you reach the summit, fully autonomous agents that operate without a human-in-the-loop, on their own goals, on their own schedules, with their own judgment.

The framing assumes the summit is the destination. I want to argue that the summit is the wrong target, not because we can't get there technically, but because production never asked for it and never will.

The right target is bounded autonomy. As much rope as the agent has earned, calibrated to the operation, the blast radius, and the recovery story. No more. The discipline isn't "make the agent more autonomous." It's "make the boundaries of autonomy explicit and enforce them."

This is a framing piece for a longer thread. There's a six-rung ladder coming in a later article that pins down what "calibrated autonomy" looks like in practice. This piece is about why the ladder is the right shape and the summit is a mirage.

Why the summit is a category error

The most clarifying way I've found to talk about this is to look at the equivalent failure mode in adjacent fields. We don't ship "fully autonomous" deployment systems. We ship deployment systems with clearly defined autonomy at every step, the build is automatic; the test gate is automatic; the canary push is automatic; the production cutover requires a confirmation, or a rollout window, or a gating signal from a check that someone configured. Every successful deployment platform I've worked with has been a carefully constructed gradient of autonomy across the pipeline, with explicit boundaries at the points where the cost of being wrong is high.

We don't ship "fully autonomous" payment systems. We ship payment systems that can clear a thousand-dollar transaction without a human looking at it, hold a hundred-thousand-dollar transaction for a review, and refuse to clear a million-dollar transaction without a second-factor approval and an out-of-band confirmation. The autonomy is bounded by amount, by counterparty risk, by velocity, by the recoverability of the operation. Nobody who runs a payment system seriously believes the goal is "no humans, ever."

We don't ship "fully autonomous" surgery systems, or fully autonomous airliners, or fully autonomous drug-prescription systems. We ship systems where the machine handles the parts the machine handles well, and human judgment gates the parts where the cost of a wrong machine answer is large or irreversible. The frontier is "what's the right autonomy boundary," not "how do we delete the boundary."

The agent discourse keeps treating AI as if it's the exception to this entire pattern. It isn't. Production AI looks like every other production system, a careful gradient of automation calibrated to the cost of being wrong, with the boundary located at the cost transition. The "fully autonomous" version is a thought experiment with no production analog, because no production environment has ever been designed around the assumption that the cost of a wrong action is zero.

What the summit-chasers are actually optimizing for

To be fair to the position I'm arguing against: there is a real thing the autonomy-maxxers are pointing at, and it isn't crazy. The thing is the cost of human time. Humans in the loop add latency. Humans in the loop add cost. Humans in the loop don't scale linearly with workload. The dream of the fully autonomous agent is the dream of a system that processes a workload at machine speed without the bottleneck of human attention.

The dream is real. The frame is wrong. The right answer to "humans in the loop are a bottleneck" isn't "remove the humans." It's "put the humans where they're load-bearing and remove them where they aren't." Some operations the agent should do alone. Some operations the agent should do with a confirmation. Some operations the agent should do under continuous supervision. The interesting design question is which is which, not how to collapse the gradient into one extreme.

Most teams I've watched build agent systems are doing this implicitly already. They have agents that run linting and formatting alone. They have agents that draft PRs and request a human review. They have agents that propose schema changes and require explicit approval. They've built a gradient by intuition. The "fully autonomous" framing pushes them to flatten the gradient by removing the gates, which is exactly the wrong direction.

What "bounded autonomy" actually means

Bounded autonomy is the working name for the gradient. It's the design discipline of pinning down, for every kind of operation an agent might take, exactly how much autonomy the agent has earned for that operation. The bound isn't a global setting. It's per-operation, calibrated to:

Blast radius. What's the worst case if the agent is wrong? An agent that runs a kubectl get is operating at zero blast radius, it can't break anything by reading. An agent that runs a kubectl delete namespace is operating at high blast radius, it can take production down. The autonomy bound for the second one is much tighter than for the first, regardless of how clever the agent is.

Recoverability. If the agent does the wrong thing, how expensive is the recovery? A typo in a draft PR is recoverable in seconds. A git push --force against main is recoverable in hours and probably with help. A wrong production database mutation is recoverable in days, sometimes not at all. The autonomy bound tracks the recovery cost.

Reversibility window. Some operations are recoverable for an hour and then they aren't. A dropped column with a backup taken two minutes ago is recoverable. The same dropped column three weeks later, after the backup retention window passed, isn't. Bounded autonomy considers the time horizon, not just the immediate undo path.

Verifiability. Some operations have an obvious correct answer and you can check it. "Did the test suite pass?" is verifiable. Other operations don't have a correct answer until much later. "Is this architectural choice the right one?" isn't verifiable for months. The autonomy bound is tighter where the verification is harder, because the agent's confidence in its own work is less informative when the work resists checking.

Stakes ratio. What does the agent gain by acting alone vs. what does it gain by waiting for confirmation? If the latency cost of the human-in-the-loop is small and the cost of being wrong is large, the human stays in the loop. If the latency cost is large and the cost of being wrong is small, the agent acts alone. The ratio sets the bound.

These are the dimensions a real autonomy bound is calibrated against. None of them is "how smart is the model." None of them is "what year is it." The bound is set by the operation's properties, not by the agent's capabilities. A stronger model doesn't change the blast radius of kubectl delete namespace. It just makes the agent more confident about running it.

What this looks like in production

In the AI platform layer of an enterprise stack, bounded autonomy shows up as a set of explicit rope-lengths per operation class. Some examples I keep seeing land:

An agent that drafts code can do so unattended. The PR is the artifact. The human review is the gate.
An agent that runs the test suite can do so unattended. A failed test is the gate.
An agent that proposes a schema migration can do so unattended. The migration plan is the artifact. The human approval is the gate.
An agent that proposes destroying a resource cannot do so unattended. The plan is the artifact; the destruction requires explicit confirmation. (This is the agent version of Article 3. Double check, never delete.)
An agent that triages an alert can resolve low-severity alerts unattended, escalate medium-severity alerts to a human, and page on high-severity alerts. The severity bands are the autonomy boundary.
An agent that operates on customer data has its operations bound by the data classification. Public data: agent acts. Internal: agent acts with audit log. Confidential: agent acts under continuous supervision. Restricted: agent doesn't act.

Each of those is a different rung on the same ladder, and each of them is calibrated to the operation, not to the agent. Same agent, different rung depending on what it's about to do. The rope length is a function of the operation's properties.

The DaC connection

The approach I keep coming back to for everything platform-shaped is Decisions as Code, extract the business decisions out of the platform's full configuration surface into a small curated layer, project them onto every consumer, validate at the boundary, enforce at runtime. Bounded autonomy is the same shape applied to the agent layer.

The "decision" in the bounded autonomy case is the rope-length for each operation class. The "centralized layer" is the autonomy policy, a small, curated set of bounds keyed off operation properties (blast radius, recoverability, etc.). The "projection" is the per-agent enforcement, the wrapper around the tool call that consults the policy and decides whether to execute, request confirmation, or escalate. The "boundary validation" is the agent runtime checking the policy before acting. The "admission enforcement" is the platform layer (OPA / a tool-call gateway / whatever you've built) blocking violating actions.

DaC was always meant to extend to operational decisions. The autonomy bound for an agent operation is just another decision the business has to make once and project everywhere. The centralized policy says "destructive operations on production require human confirmation." Every agent in every application picks that up. Change the policy, every agent picks it up on the next call. Same shape, different consumer.

The teaser

There's a six-rung ladder of bounded autonomy that I'll spell out in a later piece. The rungs run from "agent reads only" through "agent acts unattended within a sandbox" through "agent acts in production with confirmation" through "agent acts in production unattended within a tight scope," and so on. Each rung is calibrated to the operation properties above. Each rung has a different gating mechanism. The point of the ladder is to give platform teams a vocabulary for talking about agent autonomy that's more granular than "how autonomous is your agent", which is the wrong question, because the right answer depends entirely on what the agent is about to do.

For now, the framing piece is enough. Full autonomy is a category error. Bounded autonomy is the design target. The bound is set by the operation, not by the agent. The approach is the same approach that's been working for everything else: centralize the decision, project it everywhere, enforce it at the boundary.

The summit isn't the destination. The ladder is.

, Sid