An auditor walks into an AI shop

The 2026 audit conversation about AI usage has gotten sharp. The questions are sophisticated, the evidence asks are specific, and most shops can't produce what's being asked for. Here's what the conversation actually sounds like and where the gap sits.

Sid Smith

11 Mar 2026 • 7 min read

The setup writes itself. An auditor walks into an AI shop and asks a question that would have been a footnote two years ago and is now a forty-five minute conversation. Both sides are nervous. The auditor has a checklist that didn't exist in the 2024 cycle. The shop has a story about responsible AI usage that has never been stress-tested by someone whose job is to verify it.

I've read deeply into this conversation, talked to people on both sides of it, and watched how the question set has been forming in audit-readiness discussions over the last six months. The audit profession got serious about AI evidence faster than most people who actually use this stuff noticed. The shops on the receiving end are mostly underprepared, even the ones that consider themselves governance-mature. The gap is real and it's wider than the AI-readiness narratives admit.

I'm pro-governance. I keep saying that because the AI-doomer framing and the AI-acceleration framing both treat governance as an obstacle to the thing they actually care about. It isn't. The audit cycle is the place where the responsible-AI claim meets the receipts, and the receipts are what determine whether the claim is anything more than marketing.

What the questions actually sound like in 2026

The 2024 SOC 2 conversation about AI was usually one or two questions, both of them shallow. Do you use AI? What's your acceptable-use policy? The auditor checked the box, the customer signed off, the report shipped. Nobody had clarity on what the right questions even were.

The 2026 conversation is different. The questions surfacing in audit-readiness discussions, from auditors writing about their own playbooks, from engineers describing what their assessors asked, from the small slice of public reporting on AI audit outcomes, paraphrased lightly:

On model and vendor selection. Which AI vendors are in scope for the system being audited? What due diligence did you perform on each? Where are their data-processing agreements and what do they say about training, retention, and subprocessor disclosure? When you switched from vendor A to vendor B last quarter, what was the change-management evidence?

On data flows. What categories of data are sent to which AI systems? How do you know? Show me the data-flow diagram and the most recent date it was reviewed. Where are the controls that prevent customer PII or regulated data from leaving the boundary you've defined? Show me the test that verifies those controls work.

On logging. Do you log AI tool calls and prompts, and for how long? Who has access to those logs? When the customer asks "did you ever send our data to a model," what's the query you run and how long does it take? Show me an example log entry.

On human oversight. Which AI-mediated decisions have human-in-the-loop controls? Which don't? Where's the documented threshold that determines which category a given workflow falls into? When was that threshold last reviewed and who reviewed it?

On change management. When a model version changes (vendor-driven or your-side) what's the evaluation that runs before the change is accepted? Where are the eval results? Who approved the change? What's the rollback process?

On shadow AI. What's your discovery process for AI usage outside the sanctioned tooling? When was the last shadow-AI sweep? Show me the inventory before and after. What's your incident response when a new tool shows up?

These are not gotcha questions. These are the questions a serious auditor asks because the AICPA and ISO updates around AI governance gave them the mandate to ask. The questions are sophisticated and the evidence asks are specific. That combination is what's new.

What most shops can actually produce

The honest assessment of what most AI-using shops have ready when these questions hit, in March 2026:

Policy documents, yes. Acceptable-use policies, AI ethics principles, vendor-evaluation checklists. The paper is largely there, and it has gotten significantly better since 2024.

Vendor inventory, partially. Most shops have a list of sanctioned AI vendors. Few have it tied to the specific systems and data flows in scope. Fewer still have current DPAs and subprocessor disclosures filed in the same place as the inventory.

Data-flow documentation, weakly. This is the first place the conversation usually breaks down. The diagram exists or it doesn't. If it exists, it's usually six months stale. The controls that would prevent regulated data from going to the wrong model are often described in policy and not implemented in pipeline.

Logging, badly. The single biggest gap I see in the public reporting and the practitioner conversations. Most shops log API calls at the infrastructure level and don't log prompts, completions, or tool calls at the application level. When the auditor asks "show me what was sent," the answer is some combination of "we don't capture that," "we capture it but can't query it," and "we'd need engineering to write a script." None of those are good answers.

Human-in-the-loop documentation, inconsistently. Some workflows have it documented; many have it implied. The threshold that determines whether a workflow needs HITL is rarely written down. When asked who decides and how, the answer is usually a person rather than a process.

Change management for models, almost nothing. Vendor-driven model changes happen weekly and are rarely tracked. The shop's own model changes (fine-tunes, prompt-template updates, agent-tool changes) are usually deployed without formal eval gates. The "what changed and what was the impact" question lands hard.

Shadow AI discovery, almost never. Most shops don't run periodic discovery for AI tools used outside the sanctioned set. The conversation usually surfaces that engineering is using three coding assistants the security team didn't know about and customer success has standardized on a meeting tool that uploads transcripts to a vendor nobody approved.

The gap between what's being asked and what can be produced is large. The size of the gap is roughly the size of the gap between an AI program designed for marketing pages and an AI program designed to survive audit.

Why the gap is what it is

A few honest reasons:

The foundation moved faster than the controls. I've been writing this for two years now. The pace at which AI capabilities entered production exceeded the pace at which the surrounding control plane could be built. Logging, eval gates, change management, data-flow tracking, these all need to be built deliberately, and they were not the priority during the rush to ship AI features. Now they are, and there's catch-up to do.

The pace question, again. I keep coming back to this. Companies are deploying AI fast because the market rewards the deployment story. Short-term incentives drive the rush. Those incentives don't pay for the boring work of instrumenting tool calls and writing eval gates. The same dynamic that's accelerating displacement is producing under-instrumented AI systems. The audit cycle is where the bill comes due.

The auditor profession got better fast. The Big Four published AI-audit playbooks during 2025. The independent assessor community shared notes. The ISO 42001 work gave the audit community something to point at. By Q1 2026 the median auditor's question quality is materially higher than the median AI-shop's evidence quality. This is reversing what was true a year ago.

The vendor side moved more than people noticed. The major AI providers shipped real audit-evidence features through 2025, usage logs, data-residency controls, customer-managed encryption, subprocessor notification flows. The vendor surface for evidence is better than most customers are using. Part of the gap is shops not turning on the controls their vendors actually offer.

What actually works

I don't have a tidy framework. I have a set of behaviors that, in the reading and the conversations, correlate with shops that survive these conversations:

Log everything that goes to a model. Prompts, completions, tool calls, system prompts, model versions, latency. Application-layer, queryable, retained for the period your data-retention policy specifies. The single highest-leverage thing you can do for AI-audit readiness is fix the logging problem first.

Keep the data-flow diagram fresh. Quarterly review minimum, with the AI systems plainly in the diagram. Make it a real artifact rather than a slide that gets updated for the audit.

Write down the HITL thresholds: which workflows require a human checkpoint and which don't, why the line is where it is, and who can change it. The threshold being written down is more important than where the line gets drawn.

Treat model versions as configuration changes. Run an eval before adopting, document the eval, approve the change, and notify downstream consumers. Same rigor as a database schema change, because the impact is in the same league.

Run shadow-AI discovery quarterly. Network egress patterns, browser-extension inventories, expense reports for AI tools. Surface the unsanctioned usage, then triage it, sanction or remove. Don't pretend it isn't there.

Read your vendors' audit-evidence documentation. Most major AI providers have a SOC 2 report, a security and trust portal, a list of available audit-evidence exports. Use them. Half the evidence the auditor wants is sitting on the vendor's customer-management page waiting for you to pull it.

None of this is novel. All of it is doable. Most of it isn't being done.

The honest summary

The auditor walks into the AI shop and asks the questions a serious assessor asks in 2026. The shop, mostly, can't answer. The gap is wide, the gap is real, the gap was predictable, and the gap is closeable with a quarter or two of focused work by anyone who decides to close it.

I'm pro-governance because the audit cycle is the only place where the responsible-AI claim has to be defensible. The companies that take the cycle seriously will have a real story to tell. The companies that don't will have an AI program that exists in marketing copy and dissolves under questioning. That difference is going to matter more in 2027 than it does now.

I'd rather be in the first category. Most of the work to get there is boring. None of it is optional anymore. The auditor walked in. The questions are real. The receipts are what get you through.

What the questions actually sound like in 2026

What most shops can actually produce

Why the gap is what it is

What actually works

The honest summary

Subscribe to Echoes of the machine