Capturing the secret sauce: what actually trains the AI

Captured judgment is not a model and it's not a knowledge base. It's the small set of things the consultant has personally learned from being wrong.

Sid Smith

13 May 2026 • 8 min read

If you've ever shadowed an experienced person at their job, you know the thing they do that's hard to write down. The senior IT ops person who looks at three lines of a stack trace and says "it's the connection pool, restart the auth service" and is right. The legal pro who reads two paragraphs of a contract and circles one word, and that one word is what the deal will hinge on six months later. The career coach who reads a resume for nine seconds and asks one question that completely reframes how the candidate thinks about themselves.

That thing they do is not magic. It's not even, mostly, knowledge. It's captured judgment, a small set of patterns, learned mostly from being wrong, that they've internalized so thoroughly they don't notice they're applying them.

Captured judgment, five pieces

The hard part of building an AI product on top of a consultant's expertise is getting that captured judgment out of their head and into a form a system can use.

The piece before this one, the MVP question, argued that the model is a rented commodity and the secret sauce is what makes the product worth paying for. This piece is about what that secret sauce actually looks like as captured material, and why almost everyone gets it wrong on the first try.

What it isn't

Before I describe what captured judgment looks like. Here's what people try first and abandon.

It isn't a knowledge base. When a consultant sits down to "write down what they know," they almost always start by writing a long document that sounds like a textbook chapter. "There are three main approaches to portfolio diagnosis. The first is..." This is fine for orienting a junior employee, and it is almost completely useless for training an AI system. The AI already knows the textbook. The textbook is not what makes this consultant's work worth paying for.

It isn't a model. A lot of people hear "fine-tune a model on the consultant's data" and assume that's the deliverable. It isn't. The fine-tuned model is a byproduct of having captured the judgment well. If the captured material is good, the fine-tune comes out good almost incidentally. If the captured material is bad, no amount of fine-tuning saves you.

It isn't a prompt. Yes, prompts matter and yes, the consultant's voice and tone go into them (more on that below). But "the prompt" is not where the value lives. The prompt is the wrapper. The captured material is what the wrapper points at.

So what is it.

What it actually is

Five things, and you need all five.

1. Annotated examples

The richest source of captured judgment is examples the consultant has personally worked through, with notes on why they did what they did. Not the polished case study. The messy version where they considered three options and chose one and were wrong, or right, and noticed something the second time around.

For a sales consultant: thirty discovery-call transcripts, each annotated with "this is where the prospect actually told me their real problem, even though they were officially asking about something else." For an HR consultant building an interview rubric: a hundred resumes, each tagged with "this is the line that made me move them to the yes pile" or "this is the line that should have made me move them to the no pile but I missed it." For a financial advisor: portfolio reviews from real clients, with the diagnosis the advisor wrote and the reasoning trace, the three things they checked first, the test that ruled out the other diagnoses, the recommendation and what the client actually did with it.

Not five examples. A few hundred at minimum. And not the consultant's best examples, the representative spread. The wins, the losses, the boring middle. The AI needs to see what normal looks like in this consultant's world, not just the highlight reel.

2. Decision rules

The patterns that are explicit enough to write down, the consultant's "if you see X, the move is Y" rules. These are the easiest to capture because they sound like rules. Most consultants have ten to twenty of these. They've usually never written them down because they feel obvious.

For an IT ops consultant doing triage: "If the alert is a slow-query alarm and the deploy was within the last four hours, the deploy is the cause until proven otherwise." For a legal pro reviewing vendor contracts: "If the indemnity clause caps at fees paid. That's the first thing we negotiate, every time, no exceptions." For a marketing strategist on positioning calls: "If the founder uses the word 'platform' more than twice, the actual product is buried under the platform story and we have to dig it out."

These rules are gold. They turn into deterministic checks in the system, the AI doesn't need to reason about them, it just needs to recognize the pattern and fire the rule. Capturing them is mostly an exercise in interviewing the consultant well enough to get them out.

3. Escalation paths

Just as important as the "what to do" rules are the "this is when I stop and ask a human" rules. Every experienced consultant has these, and almost no one writes them down.

For a medical specialist doing second-opinion review: "If the imaging shows X and the patient is under 30, I do not respond by message. That's a phone call." For a financial advisor: "If the client's expressed risk tolerance and their actual portfolio behavior diverge by more than this much, I do not adjust the portfolio, I schedule a conversation." For a career coach: "If the resume is in a field I haven't recruited for in the last two years, I refer out, I don't review."

These become the boundary conditions in the system. The AI's job in those cases is not to give an answer, it's to flag the case as out-of-scope and route it to the consultant, with the reason clearly stated. The audit trail of "the system correctly recognized this was outside its lane" is one of the highest-value parts of the product, because it's how the consultant trusts the system not to embarrass them.

4. Voice and tone

Almost separate from the substance: how the consultant talks. The AI's output has to sound like the consultant, or at least sound like the consultant's brand, because the customer is paying for that voice.

This is the part that prompt engineering is for. Capturing it means a couple of things: a few exemplar pieces of the consultant's writing (emails, reports, summaries, follow-ups), tagged for the situation each was used in; a short style guide they may or may not have already (probably not, most consultants have their voice in their head); and the specific phrases they use a lot that nobody else uses quite the same way. The PM who always says "what are we actually optimizing for here." The marketing strategist who reframes every brand question as "if your customer told a friend about you in one sentence, what's the sentence." Those phrases become part of the prompt. They're how the system signals to the customer that this is that consultant's product, not generic AI sludge.

5. Failure modes the consultant has personally seen

The last piece is the one almost everyone forgets. The consultant's scar tissue. The mistakes they've watched themselves or other people make, that they're now looking out for, that the AI also needs to look out for.

For an IT ops consultant: "Once I restarted the database during a long-running migration and I was wrong about whether it was safe. Now the rule is, if the migration table has any rows in 'in_progress' state, don't touch the database." For a legal pro: "I once approved a vendor agreement that had a quietly-broad assignment clause and the vendor was acquired six months later by a competitor of my client. Now I read every assignment clause first." For a financial advisor: "I once recommended a sector tilt right before a regulatory change I should have seen coming. Now I check the regulatory calendar before any sector recommendation."

These are the patterns that make the consultant's work trustworthy. They turn into negative checks in the system, the AI flags the situations the consultant has been bitten by, even when nothing else is going wrong. This is the secret-sauce-est part of the secret sauce. It's what the customer is actually paying for: not just expertise, but experience.

Curious how this captured material gets into the live product? The retrieval mechanics, how the system pulls the right captured judgment into the right prompt at the right time, are in retrieval is the secret sauce surface. And the local stack that does the actual fine-tune from this material lives in the Mac Studio side of the stack.

How you actually get it out of the consultant's head

The interview is the hard part. Consultants are terrible at this on their own, not because they don't know what they know, but because they don't notice it. The patterns that make them good are the ones they've stopped consciously thinking about.

Three things have worked for me.

Walk through real cases together, slowly. Pick five recent client engagements. Sit down with the consultant. Have them narrate what they did and why, while you ask "why" annoyingly often. Every "I just kind of know" is a captured-judgment item that hasn't been written down yet. Drag it out.

Have them mark up an AI's first draft. Run a generic AI on a representative input, get a draft answer, and have the consultant red-pen it. Where they cross things out and write something different. That's a captured-judgment delta. Where they leave it alone, the generic AI was already good enough. The deltas are your training material.

Catch the verbal tells. Listen for "the thing I always check first is..." or "the way I know it's actually X and not Y is..." or "I once made the mistake of..." Those phrases are gold. Write them down literally as the consultant says them. Don't paraphrase. The literal phrasing often becomes the rule.

A reasonable first capture pass for one consultant is somewhere between 200 and 500 annotated examples, ten to thirty decision rules, five to ten escalation rules, a small style guide, and a list of fifteen to thirty failure modes they've personally seen. That's enough to start. You'll add to it forever.

The bit that bites people

The single most common mistake: trying to capture the consultant's judgment all at once, in a single document, before building anything. Don't do that. The captured material gets better when it's used. Get a tiny version into the live loop, run it on real customer queries, and watch where it falls down. Each failure is a new piece of captured judgment the consultant didn't think to write down because they didn't know they did it.

The product is the loop. The captured judgment grows inside the loop. Trying to perfect it in a vacuum is the same mistake as trying to perfect the model in a vacuum, you're polishing something that hasn't met reality yet.

The next piece is on the cloud-vs-local split. Where the captured material lives, where the model runs, where the live customer queries hit, and why I split it the way I do. Same architecture, different boxes for different jobs.