Pricing the service: subscription, per-resolution, outcome-based

Subscription, per-resolution, outcome-based. The pricing decision tree, the cost-of-goods math, and the free-tier question, closer for the operate series.

Pricing the service: subscription, per-resolution, outcome-based

This is the closer of the 4-piece year-one series. After this the blog goes back to the regular weekly mix, news roundups on Sundays, deeper-dive pieces midweek, the architecture stuff when something is worth writing about. Thank you for reading 22 in a row.

For the closer, the topic that everyone in the MVP series quietly wanted me to get to: how do you actually charge for any of this.

Pricing decision tree How predictable is per-customer cost? Subscription high, fixed seats $X / month / seat Per-resolution medium, variable $Y / triaged ticket Outcome-based low, only on win $Z / approved deal COGS stack: Bedrock tokens + RDS + Mac Studio amortized + ops Whichever pricing model you pick must cover this with margin. Free tier so consultants try; paid tier when they ship.
Pricing decision tree

Quick recap of where we are. A consultant signs up, picks a starter pack, uploads their material, gets a working surface in five minutes (two weeks back), runs their AI from a supervisor view while customers ask questions through a customer view (last week's piece), and the inference cost is being managed by routing across Haiku / Sonnet / Opus / Llama on Bedrock per loop (the piece that started this series). Now: somebody has to pay for it, and the pricing model is part of the product, not separate from it.

Three pricing shapes are realistic for this kind of product: subscription, per-resolution, outcome-based. Each one fits some verticals beautifully and some terribly. The cost-of-goods math behind each one is actually knowable, which is the unglamorous gift of this architecture.

Let me walk through how I think about it.

The three shapes

Subscription. Flat monthly fee, fair-use cap. The consultant pays you, you give them a tenant, customers use the surface, the meter never runs in front of the consultant or the customer. Predictable revenue for you, predictable cost for them. The downside: if the consultant's customer base 10x's, you eat the cost overage. If it never grows, you charge the same as you would for a tenant doing 100x the volume.

Per-resolution. You charge per query that gets answered (or per ticket that gets closed, or per contract reviewed, or per resume marked-up, the unit varies by vertical). The meter runs in proportion to the work the AI does. Aligns cost-to-value almost perfectly. Downside: customers and consultants both hate watching meters, and "what counts as a resolution" becomes a definitional argument that eats into trust.

Outcome-based. You charge a fee tied to a measurable outcome the AI produced. A successfully placed candidate (HR consultant). A signed deal a sales-discovery brief contributed to (sales). A contract issue caught that would otherwise have leaked through (legal). Highest possible alignment, highest possible price tag, highest possible measurement and dispute risk.

None of these is right or wrong. They're trade-offs across four dimensions: revenue predictability, cost alignment, sales friction, and dispute risk. The right shape depends on the consultant's vertical and the consultant's own customer relationships.

The decision tree I actually use

Three questions I ask, in order, before suggesting a pricing model to a vertical.

Is the unit of work clearly definable from outside the system?

Per-resolution and outcome-based both depend on having a unit that the customer and consultant agree counts. Some verticals have this naturally. Contract review: a contract is a contract. Resume coaching: a resume is a resume. Interview rubric: a candidate writeup is a candidate writeup. You can charge per unit and nobody argues.

Other verticals don't have a clear external unit. Sales discovery, what's a unit? A call prep brief? A research session? A whole pursuit? The consultant and customer might define it three different ways and the product can't enforce any of them without irritating somebody. In those, subscription is the safe default because the meter problem doesn't exist.

How variable is per-tenant volume?

If your tenant base is going to span "consultant doing 30 customer queries a month" to "consultant doing 30,000," subscription pricing breaks one of them, usually you, on the high end. Per-resolution scales with use, which is what you want when the spread is wide.

For verticals with naturally narrow spread, say, medical second-opinion review where each specialist's volume is bounded by their own throughput, subscription works fine.

How directly attributable is the AI's work to a measurable outcome?

Outcome-based pricing only works when you can prove the AI moved the needle. A legal-pro product that catches a clause that would have cost the customer $50k is straightforwardly outcome-attributable. A career-coach product that helped someone get a job is somewhat attributable but lots of other things contributed. A marketing-positioning advisor whose AI-assisted brief contributed to a quarter's better revenue is barely attributable at all without a much bigger measurement apparatus.

If attribution is clean, outcome-based gets you the highest revenue per customer. If it's muddy, don't bother, you'll spend all your effort defending the bill.

The cost-of-goods math, honestly

Here's where the architecture from the MVP series pays back. Because you have observability and audit on day one (piece #13) and you've thought about cost as a design input from the start (piece #15), you can actually compute COGS per query. Most AI products can't.

Per-query cost has three layers.

Layer 1: Bedrock tokens. Variable per query, varies dramatically by which model the router picked (see the model-selection piece). Triage with Haiku is fractions of a cent. Diagnose with Sonnet is low single-digit cents. The 5-10% of cases that escalate to Opus are 5-10x that. Llama batch work is low. If you log per-query model selection (you should be), you can compute exact Bedrock cost per query and roll it up to per-tenant per-month.

Layer 2: RDS + storage + bandwidth. Per-tenant overhead. The pgvector store grows with the consultant's corpus. The audit table grows with usage. RDS instance cost is shared across tenants. CloudWatch logs are real money at scale. Plus S3 for artifacts. This layer is harder to attribute exactly per query, but you can attribute it per tenant per month with reasonable accuracy.

Layer 3: Mac Studio amortized. The local stack (piece #5) (fine-tuning, batch inference, transcription, image gen) has a fixed capital cost and an electricity bill. Spread that over your tenant base divided by the share of work each tenant pushes through the local pipeline. For most products this layer is small per-tenant per-month if your tenant base is healthy. If you have three tenants, the Mac Studio is expensive per query. If you have 300, it's basically free.

Add the three layers, and you have a per-tenant-per-month COGS number you can put up against any of the three pricing models and check whether your margin is real.

The number that matters: what's the gross margin at typical tenant volume? If subscription pricing puts you at 40% margin on a typical tenant and 5% margin on a heavy-use tenant, your subscription tier needs a usage cap or a heavy-use overage rate. If per-resolution pricing puts you at consistent 60% margin across tenant sizes. That's the right shape for that vertical.

How the three shapes play across verticals

Three quick walkthroughs, then a fourth on the cross-vertical pattern.

An IT-ops consultant doing infrastructure triage and resolution. Volume per tenant is highly variable, small managed-services shops doing 50 tickets a week, large ones doing 5,000. Unit of work (a resolved ticket) is naturally well-defined. Outcome attribution is direct (ticket either resolved or didn't). Per-resolution wins. Meter the resolved-tickets count, charge per, set a tiny baseline subscription so you have predictable floor revenue plus the per-unit upside.

A career coach doing resume + positioning review. Volume is narrower (a coach has so many candidates per month), unit is clear (a resume), outcome is muddy (job offers come from many sources). Subscription per coach with a fair-use cap is the cleanest shape. Maybe a small per-extra-resume overage for coaches who go over the cap. Outcome-based is a tar pit here, too many factors contribute to landing a role.

A legal pro auto-reviewing contracts against their playbook. Volume varies but is mostly bounded by the lawyer's own bandwidth. Unit (a clause flagged, a contract reviewed) is well-defined. Outcome attribution is occasionally crisp ("this clause would have cost the client $X if it shipped, we caught it") but mostly fuzzy. Hybrid: subscription floor plus a per-contract-reviewed line item. The lawyer knows monthly cost will be in a band. The product gets paid more when used more.

The pattern across verticals. Most consultant-AI products end up at subscription with a usage component on top. Pure subscription leaves money on the table for high-volume tenants and undercharges-then-loses-margin on heavy ones. Pure per-resolution puts a meter in front of the customer that nobody enjoys watching. Hybrid is boring and right.

The cost-of-goods math here is only possible because the architecture from the MVP series logs everything per query in a structured way. If you skipped the audit-on-day-one investment from piece #13, you cannot price a hybrid model honestly. You'll be guessing at margin.

The free-tier question

Yes. Have one. Here's why and how.

A free tier in this product isn't "free chat with no value." It's "let the consultant use the onboarding flow and get to the five-minute moment, then let them try a small number of real customer queries before committing." Five minutes from the onboarding piece is the trial.

The free tier exists to let the consultant prove the product works on their material before they pay you. That's the highest-leverage demo you can run, and it scales, every signup runs it themselves, you don't have to give a sales call.

The cost of the free tier is real. A free tenant takes RDS rows, embeds documents (storage), runs Bedrock calls (per-token cost), generates audit rows. So you cap it. The numbers I've seen work:

  • 50 customer-side queries total in the free tier (lifetime, not monthly).
  • Limited corpus size (say 50 MB of uploads or 200 documents).
  • Full feature access, no neutered functionality, if you make the trial weak, the conversion will be weak.
  • Auto-suspend after the cap until they convert; don't auto-bill, don't surprise-charge.

Cost per free tenant works out to a manageable number of dollars per signup. Conversion rate from free to paid in this kind of product, when the onboarding actually delivers the five-minute moment, lands somewhere in the 8-15% range based on what I've seen elsewhere. The economics work if your paid plan margin can carry roughly 7-12 free tenants per paid one. For most of the verticals here, it can.

Don't fall into "free tier with rate limits per day." That just teaches the consultant your product feels stingy. Generous-but-bounded beats stingy-but-generous on time horizon.

Pricing changes are product changes

One thing I want to leave you with as the operate series wraps.

A pricing change in this kind of product isn't a marketing change. It's a product change. Because the unit you're charging on (per resolution, per contract, per resume) has to be measured by the system, displayed in the consultant view, defended in the audit trail, and capped or metered in the customer view. Switching from subscription to per-resolution means engineering work in five places.

Which means: pick one model to launch with, run it for at least a quarter, watch where it breaks, and only then think about adding the second. The biggest pricing mistake I see startups make in this space isn't picking the wrong model. It's flipping the model after three months because revenue isn't where they hoped, and creating a billing dumpster fire that takes another quarter to clean up.

Pick deliberately. Wire the meter end-to-end. Watch what the data tells you. And be prepared to defend whatever you charge against the COGS math, because the customer who challenges you on it is doing you a favor, they're telling you they care.

What's next

This wraps the four-piece operate series. The full 22-article arc, 18 MVP pieces (cap article here) plus these 4, was built around one idea: the architecture that turns any consultant's secret sauce into a working AI-powered product is a known shape. What changes is what's yours.

If you've followed the whole run, the next ask I'd put to you is the one I keep putting to myself: what's the smallest version you'd actually ship? Not the version you'd build if you had a year. The version you'd put in front of one consultant tomorrow. That's the cut line.

Back to the regular cadence from here. News roundups on Sundays, deep-dives midweek, whatever bites me hard enough to write about as it happens. Thanks for reading, and if you're shipping anything in this space, drop me a line. I want to hear what cracks.