Automation

OPA + AI agent: policy-as-code for AI workloads

A month past the OPA-renaissance piece, with more shops actually shipping agent-policy stacks. Worth getting concrete about what the production deployments look like, the patterns, the policy bundle structure, and the interfaces that survive contact with reality.

Sid Smith

12 Sep 2025 • 6 min read

A month past the OPA renaissance piece, with more shops actually shipping production agent-policy stacks rather than just experimenting. Let me get concrete about what those deployments look like, the policy bundle structure, the evaluation patterns, the integration with the agent surface, the operational discipline that turns "we're using OPA" into "we have a working agent-governance story."

Less editorial than the prior piece; more shaped by the people actually using this. For people thinking about building this in their own org.

Every agent action passes through policy.

Every agent action passes through policyAgentproposes actionOPA / Regopolicy as codeAllowexecute + logDenyreject + log + reasonevery decision recorded, allow and denyEvery agent action passes through policy.

The starting point

The deployments that are working share a common starting shape:

An agent gateway between the agent runtime and the model/tool layer. Every agent request goes through it.
An OPA sidecar running alongside the gateway. Receives policy decision requests; returns allow/deny + reasoning.
A policy bundle store (an OCI registry, a git repo, or a Bundle Service endpoint) that holds the versioned Rego policies. The OPA sidecar pulls bundles on a defined cadence.
A structured input contract, the gateway sends a well-defined JSON shape to OPA describing the request (agent identity, user identity, tool, args, environment context).
A logging path, every policy decision (input, output, decision, version, latency) lands in a queryable store.

That's the foundation. Most of the complexity is in the policy authoring; the infrastructure is straightforward.

The policy bundle structure that works

The naive version is "one Rego policy that decides everything." That works at small scale and falls over fast. The mature version is structured.

Identity policies. Rego rules that resolve "is this agent / user / service account a real one, what's it allowed to be in scope for." Output: the agent's capability set as structured data the other policies use.

Resource policies. Rules that govern access to specific resources (data sources, tools, model surfaces). "Is this agent allowed to call this tool with these args?" The bulk of the everyday decisions land here.

Workflow policies. Rules about higher-order behavior, "is this agent allowed to call this many tools in a row, take this many tokens of context, run for longer than N seconds." Catches the runaway-tool-call class of failures at the policy layer.

Compliance policies. Rules tied to regulatory or contractual requirements, "this data class can only go to these models in these regions." Often the most-revised category because compliance requirements change.

Org-specific override policies. The "we know better than the defaults for these specific cases" rules that orgs accumulate. Should be small and well-justified; tend to grow if not policed.

Each category lives in its own bundle, gets reviewed by appropriate stakeholders, and gets deployed on its own cadence. Decoupling these prevents the "one giant policy file that nobody understands" antipattern.

The evaluation pattern

The gateway → OPA → decision → action loop, in detail:

Agent issues a request. "I want to call this tool with these arguments."
Gateway extracts the context. Agent identity, user identity, tool, args, current conversation state, environmental flags (region, time-of-day, session metadata).
Gateway calls OPA with a structured input bundle.
OPA evaluates the input against the loaded policy bundles. Returns: allow/deny, reasoning, optional metadata (per-call rate-limit info, suggested alternatives, audit metadata).
Gateway acts on the decision. Allowed → forwards to the tool; denied → returns the policy decision to the agent with the reasoning.
Decision is logged with full input/output, the bundle version that decided it, the decision latency.

The cycle target is sub-50ms per decision. OPA at the workload sizes most orgs run hits this comfortably. The few orgs running at very high volumes need optimization (caching, partial evaluation) but the patterns are well-understood.

What the input bundle looks like

A typical input bundle structure that's working in production:

{
  "agent": {
    "id": "agent_id",
    "scope": "domain_scope",
    "owner_team": "team_id"
  },
  "user": {
    "id": "user_id",
    "groups": ["group1", "group2"],
    "data_classifications_authorized": ["public", "internal"]
  },
  "action": {
    "type": "tool_call",
    "tool": "tool_name",
    "args": { ... },
    "data_classes_referenced": ["customer_data"]
  },
  "context": {
    "conversation_id": "...",
    "turn_count": 7,
    "cumulative_tokens": 4521,
    "session_age_seconds": 142
  },
  "environment": {
    "region": "us-east",
    "time": "2025-09-12T13:45:00Z",
    "is_production": true
  }
}

That's the decision surface. Rego policies operate on this structure cleanly. The fields that matter most in practice are data_classes_referenced (drives the compliance decisions), agent.scope (drives the capability decisions), and context.turn_count / cumulative_tokens (drives the workflow-protection decisions).

What the response looks like

OPA's response back to the gateway:

{
  "allow": true,
  "reasons": ["agent_in_scope", "data_class_authorized"],
  "metadata": {
    "policy_version": "1.4.7",
    "matched_rules": ["agent.scope.tool_access", "compliance.us_east_data"],
    "rate_limit_remaining": 47,
    "audit_required": true
  }
}

Or, on a denial:

{
  "allow": false,
  "reasons": ["data_class_unauthorized_for_user"],
  "remediation": "request_elevated_data_access",
  "metadata": {
    "policy_version": "1.4.7",
    "matched_rules": ["compliance.user_data_access"],
    "appeal_path": "ticket_url"
  }
}

The structured response is the part that makes this usable. The agent gets a denial it can explain to the user; the human reviewer gets enough context to understand what happened; the audit log gets enough metadata to query later. Compare to "decision: deny" with no context, the latter is a frustrating user experience and an unauditable system.

The operational discipline

A few practices that separate working OPA-AI deployments from failed ones.

Treat policies as deployment units alongside agents. A new agent or tool ships with its associated policies. Updates to either ship together. Version both. Roll back together when needed.

Run policy changes through a real test suite. Each policy change should be tested against a corpus of historical decision inputs to verify the change has the intended effect. OPA's testing tools handle this; teams that skip it ship policies that surprise them in production.

Monitor decision latency, denial rate, and reason distribution. Three metrics that catch most policy problems early. Latency creep means the policies are getting too complex; denial-rate spikes mean a recent policy change is too aggressive; reason distribution tells you which rules are actually firing in production.

Build a policy authoring practice. Rego is a real language with a learning curve. Either invest in growing in-house Rego expertise or stay limited to simple policies. The middle path (engineers writing complex Rego without enough fluency) produces fragile policies that misbehave in subtle ways.

Ship denial reasoning to the agent and the user. A denied request that just says "no" trains agents and users to work around the policy rather than to operate within it. A denied request with reasoning produces compliance via understanding rather than via friction.

The integration with MCP

The current sweet-spot pattern: each MCP server ships with an associated policy bundle that governs its use. When the gateway routes a request to a particular MCP server, the policy bundle for that server is automatically loaded into the OPA evaluation. This keeps the policy modular (one bundle per tool surface) and keeps the deployment unit coherent (the tool and its governing policy ship together).

The pattern lines up with the broader agent design patterns, the tool-scoped subagents pattern naturally extends to per-tool policies. The agent's scope determines which subagents are reachable; each subagent's policy bundle governs what arguments are valid and what side-effects are allowed; the human-in-the-loop checkpoint pattern shows up at the policy layer for high-blast-radius actions.

What's still hard

A few categories where the OPA-AI integration still has rough edges.

Cross-policy reasoning. When a decision depends on multiple policy bundles (identity + resource + compliance), the conflict-resolution semantics need to be defined plainly. OPA gives you the foundation; the conflict-resolution conventions are still org-by-org.

Stateful policy. Some decisions depend on state that's not easily expressed in the input bundle, "this agent has used 80% of its weekly budget already." The state has to come from somewhere; integrating it cleanly with OPA's mostly-stateless evaluation is non-trivial.

Policy authoring at scale. Once you have dozens of policies across dozens of teams, the policy-as-code base needs the same governance the code base needs, review, test, deploy. The teams that scale this well treat the policy repo with the same rigor as the application code.

Decision explainability for non-technical reviewers. A compliance officer wants to know "what does our policy say about X." Reading Rego is not the answer for them. The decision-explainability tools that translate Rego into human-readable summaries are improving; not yet at the polish level the use case demands.

Where this goes next

The OPA-AI integration is moving fast. The patterns I've described are stable; the surface area around them is still growing. Standard input-schema conventions, common policy libraries, better authoring tools, tighter MCP integration, all of these are in active development. The next twelve months should produce more standardization here than the prior twelve did.

For teams getting started: the foundation is mature enough to commit to. The patterns work in production. The cost of building it now beats the cost of building it later, when the policies you're enforcing have already been violated for a year.

The governance gap doesn't close by itself. OPA + AI agent is the most-mature open-source path to closing it. Worth being deliberate about adopting it well.