Triage, diagnose, resolve: the three loops of an AI product
Triage is cheap and fast. Diagnose is slow and smart. Resolve is gated by a human until it isn't. The three loops of an AI product, with different cost profiles and different SLAs.
Here's the layman version. Picture a good IT helpdesk. Someone files a ticket. The first person to look at it doesn't try to solve it. They glance at it and decide: password reset, network problem, printer thing, or "wake the engineer at 2am" thing. They route it. That's all they do. Cheap, fast, a hundred tickets an hour.
The next person (the diagnostician) picks up the routed ticket and actually thinks about it. Pulls up the account, checks the logs, looks at recent changes, maybe asks a clarifying question. Five or ten minutes per ticket. Costs more.
Then somebody decides what to actually do about it. Reset the password. Reboot the switch. Open a vendor case. Some are routine; others touch something sensitive and need a second pair of eyes.
Three jobs. Three costs. Three time pressures.
An AI product that wraps a consultant's expertise has the exact same three jobs. Triage. Diagnose. Resolve. They're the three loops. They have different cost profiles and different SLAs (service-level agreements, how fast you promise the answer, if you want to look it up later). The most common mistake I see is teams building one giant AI loop that does all three at once. Doesn't work. Costs too much. Too slow. Hard to govern.
Let me walk through what each loop does, what fits each, and where the human-in-the-loop gate sits.
Why one loop isn't enough
The instinct is: give the big model the request and let it figure out routing, diagnosis, and resolution all at once. One prompt, one call, one answer. Done.
The problem isn't quality. It's economics and operations.
Every request pays the full cost of the most-expensive model. Every request waits the full latency. Every request runs the same pipeline whether the user asked "where's the login link" or "review this 47-page contract for any clauses that conflict with our standard terms." The first should cost a fraction of a cent and return in 200ms. The second should cost a few cents, take 30 seconds, and probably involve a human approval before anything ships. Treating them the same is wasteful both ways.
The three-loop pattern routes each request to the right amount of compute, context, and human oversight. Cheap stays cheap. Smart gets smart. Nothing pays for what it doesn't need.
Loop 1: triage, intake and route
Triage is the part that looks at every incoming request and decides what kind of thing it is. It does not try to answer the request. It just classifies and routes.
The properties of a triage loop:
- Cheap. Fractions of a cent per call. This loop runs on every request, so the unit cost compounds.
- Fast. Sub-second. The user is waiting and they should not feel a delay before something starts happening.
- Narrow. Output is a small, well-defined classification, usually a single label or a small JSON object. Not free-form text.
- Stateless. Each request is classified on its own. No long context. No retrieval.
- High-throughput. The model needs to handle many requests in parallel cheaply.
On Bedrock, triage runs on Haiku, the cheapest, fastest Claude. With a tight prompt that says "classify this request into one of these N categories and return the label" the per-call cost is tiny and the latency is fine. For the highest-volume categories I'll sometimes use a fine-tuned local classifier on the Mac Studio, but for most MVPs Haiku is the right tool.
For the IT ops consultant productizing their triage tree: every ticket runs through Haiku with a prompt that knows the category structure (auth, network, identity, hardware, app-specific, "needs a human now"). Output is a category, a confidence score, a one-line summary. Less than a tenth of a cent per call. The category determines which downstream queue the ticket lands in.
For the financial advisor productizing portfolio diagnosis: every uploaded portfolio gets classified by size, complexity, and which playbook applies (retirement-heavy, taxable-focused, business-owner with concentrated stock). The category routes to the right diagnostic prompt downstream.
For the marketing strategist productizing brand-positioning advice: the incoming request (product, audience, competitive context) gets classified by industry, stage, and the kind of positioning move (differentiation, repositioning, launch). The triage label decides which body of past work diagnose will draw from.
Triage is also where you catch out-of-bounds cases. The IT ops product gets a ticket about HR policy? Triage flags it and bounces it. The triage prompt should know what's in-bounds and what isn't.
The output of triage is a routing decision and a small payload. Anything fancier and you've built two loops, not one.
Loop 2: diagnose, apply judgment via RAG and a real model
Once triage has routed, the diagnose loop picks up. This is the loop that does the actual thinking.
The properties of diagnose:
- Slower. Seconds, not milliseconds. A few seconds is fine. Twenty is fine for some categories.
- More expensive. Cents per call, not fractions. You're invoking a bigger model with a bigger context window.
- Context-rich. This is where retrieval lives. The chunks of the consultant's body of work that match the request get pulled in here.
- Stateful in a careful way. The loop knows the request, the conversation so far, the retrieved context, and the consultant's playbook. It does not need to know about every other request the system has ever seen.
- Generates structured output. A diagnosis with confidence, a recommended action, the evidence trail. Not just a free-form paragraph.
On Bedrock, diagnose typically runs on Sonnet. Sometimes Opus for the hard categories, the ones where the cost of being wrong is high enough that you want the better model even at three times the price. The triage label decides which model the diagnose loop uses.
For the IT ops product: a ticket triaged as "auth. Okta-related" lands in a diagnose loop that retrieves the Okta runbook, the customer's recent auth logs, and relevant past tickets. Sonnet produces a structured diagnosis, root cause, evidence, recommended action, confidence score.
For the financial advisor product: a portfolio classified as "retirement-heavy, late-career" gets routed to a diagnose loop that pulls the advisor's late-career playbook, tax guidance, and notes on similar past portfolios. Sonnet produces a diagnosis flagging concentration risk and a recommended sequence of actions with rationale.
For the marketing strategist product: a positioning request classified "differentiation, B2B SaaS, mid-stage" gets routed to a diagnose loop with the strategist's differentiation playbook, positioning history of similar past clients, and the competitive context. Sonnet produces a positioning diagnosis with three drafted statements ranked by fit.
Retrieval is what makes diagnose work. I went deep on the retrieval side in retrieval is the secret-sauce surface. The diagnose loop is the consumer of that pipeline, it eats the chunks that pipeline produces.
The diagnose loop's output is a proposal, not an action. It says "here's what I think is going on and here's what I'd do about it." It does not actually do the thing. That's the next loop.
Loop 3: resolve, propose action, with a human gate
Resolve is where things actually happen in the world. A password gets reset. A trade gets executed (or at least proposed for execution). A positioning recommendation gets sent to a client.
This is the loop where the cost of a mistake is highest and the SLA is most flexible. Customers will wait for a human approval if the alternative is the AI doing something irreversible badly.
The properties of resolve:
- Action-bearing. Resolve produces side effects in the real world. Tickets get updated. Money moves. Documents get sent.
- Gated. A human approves it on day one, every time. No exceptions. The human reviews the diagnosis, the proposed action, the evidence. They click approve or deny.
- Audited. Every approve/deny is recorded with the actor, the time, the evidence presented, the action taken or not taken, and the outcome.
- Reversible where possible. When the action is reversible, the resolve loop also captures what would be needed to reverse. Sent the wrong email? Here's the un-do.
The model in this loop is often the same as diagnose (Sonnet), what changes is the prompt template, which is now framed around "given this diagnosis and these proposed actions, present them in a form a human can approve in 30 seconds." Compact summary at the top. Evidence below. Action button at the bottom.
For the IT ops product: the diagnose loop has produced "re-add user X to group Y." The resolve loop renders that as an approval card, user, group, change, reason, one-click approve. Click approve and the ticket updates, the group change happens, the audit log captures it. Click deny and it goes to manual handling with the AI's analysis attached.
For the financial advisor product: the diagnose loop has produced a recommended sequence of trades. The resolve loop does NOT execute trades. It produces a recommendation document the advisor reviews, signs off on, and either sends to the client or hands to whatever execution system sits behind it. Money never moves without a human in the chain.
For the marketing strategist product: the resolve loop drafts the actual deliverable (a positioning brief, a messaging matrix, a deck outline) that the strategist reviews, edits, and sends. Nothing goes to a client without the strategist's name on it.
The gate doesn't stay forever, on every category. I'll get into when and how to relax the gate in the approve/deny gate and when it goes away. Short version: you mine the approvals, you find the safe classes, you let those auto-resolve, you keep the audit trail through the transition.
How the three loops actually wire together
In the AWS shape, each loop has a clear home.
Triage is a Lambda behind API Gateway. The request comes in, the Lambda calls Bedrock (Haiku) with the triage prompt, gets back a category and confidence, writes a row to the requests table, and emits an EventBridge event with the routing decision. A couple hundred milliseconds end to end.
Diagnose is triggered by the EventBridge event. A separate Lambda (or a Step Function for the longer ones) runs the retrieval pipeline against pgvector, calls Bedrock (Sonnet, sometimes Opus), writes a structured diagnosis, and emits another event. Usually a few seconds.
Resolve is triggered by the diagnose event. The resolve Lambda renders the approval card, writes a pending-approval row, and pushes a notification to the consultant's queue UI. The side effect runs only after the approve event lands. The audit log is written at every step.
Everything goes through the event bus. Each loop is its own Lambda with its own metrics and retry behavior. When Bedrock rate-limits or a downstream integration is down, the failure is contained to one loop and the system degrades gracefully instead of failing as a unit.
Cloud architecture under all of this. The Lambdas, EventBridge wiring, and queue patterns sit in the AWS-native shape I actually start with.
The cost story
The three-loop pattern is also the cost story.
Triage runs on every request. Haiku, small prompt, small output. Tenths of a cent.
Diagnose runs on most requests, some get bounced at triage as out-of-scope. Cents per call. Sonnet, real context, real retrieval.
Resolve runs on the diagnose subset that actually proposed an action. Sometimes the customer abandons. Sometimes the human denies. Resolve cost is similar to diagnose, but volume is lower.
If you're shipping 10,000 requests a day, your bill is dominated by triage in volume but by diagnose in dollars. Knowing this lets you tune the right thing. Want to cut the bill? Either move triage to a fine-tuned local classifier, or push more diagnose categories from Sonnet to Haiku where the eval data says it's safe. Don't tune until you know which loop is actually costing you.
What I'd build first
If you're shipping one of these next month, build the loops in this order:
- Triage first. Even if everything just routes to "human, please handle" on the other side. A working triage loop tells you what your real category distribution looks like, which is information you'll need for everything else.
- One diagnose category second. Pick the highest-volume triage category and build the diagnose loop for just that one. Get the retrieval pipeline working for that one corpus. Ship it. The other categories can wait.
- Resolve with human approval, every time. No auto-resolution on day one. Even for the categories that look obviously safe. You don't have the data yet to know they're safe.
- Eval suite covering all three loops. Triage gets classification accuracy metrics. Diagnose gets per-category quality metrics on golden examples. Resolve gets approval-rate and override-rate metrics. The eval harness is the next piece in this series.
The three loops are the shape of the product. Build them once and the rest of the work (adding categories, swapping models, tuning costs) happens inside a frame that already works. Build them tangled together and you'll spend the next year unbraiding them in production while customers watch.