Stop letting AI vendors handle your data: the on-prem case

The default for AI in 2025 is hosted. The hosted default makes sense for some workloads and is structurally wrong for others. Worth being explicit about which workloads belong on-prem and why the inertia is hard to break.

A heavy ornate brass vault door fully closed on a dark wooden surface with a thick chain wrapped around its handle and a brass padlock secured tight

The default for AI in 2025 is hosted. Most workloads, most teams, most users, the path of least resistance is whichever vendor’s API is easiest to call. The default makes sense for some workloads. For others it’s structurally wrong, and the structural wrongness gets harder to fix the longer the inertia accumulates.

The on-prem case isn’t “everything should run locally.” It’s “for these specific workloads, on-prem is the right answer and the hosted default is producing real costs you might not be measuring.” Worth being explicit about which workloads and why.

When hosted is the right call

Worth opening with the cases where the hosted default is correct, because the on-prem maximalist version of this conversation is wrong:

  • Frontier-tier capability workloads. Hosted Opus 4 / GPT-5 / Gemini 2.5 Pro are still ahead of any open-weights model on the hardest tasks. If your workload depends on the marginal capability, hosted is right.
  • Bursty workloads with low average usage. Hardware that sits idle isn’t paying back. Hosted’s pay-per-use economics fit this shape.
  • Multi-tenant SaaS at scale. The economies of scale on hosted inference still favor cloud for the production-SaaS shape.
  • Operational-simplicity-first shops. When the team’s binding constraint is “we don’t want to think about hardware,” hosted is what they’re paying for. Legitimate buy.

That’s the correct default for those workloads. The on-prem case is for everything else.

When on-prem is the right call

Five workload categories where the on-prem answer is structurally better than the hosted one:

Privacy-bound workloads. Anything where the data classification, sensitive customer data, employee data, medical records, legal-privileged communication, financial records, makes hosted a non-starter or a serious risk. On-prem isn’t optional here; the data calculus rules out hosted.

High-volume routine workloads. Document classification at scale, embedding generation, batch summarization. The combine cost on hosted compounds; the combine cost on amortized hardware is essentially zero past the break-even. Past a few hundred thousand requests per month per workload, on-prem economics start winning meaningfully.

Always-on personal-AI workloads. The assistant that watches your local files, your local activity, your local context. The data calculus argues for local; the latency calculus argues for local; the hosted equivalent is a worse experience and a worse privacy story.

Workloads where vendor leverage matters. When the vendor’s pricing changes, capability changes, or business model changes, your workload is exposed. On-prem reduces the leverage; the vendor lock-in cost is bounded.

Workloads in regulated environments. When the compliance regime requires specific controls (HIPAA, GLBA, sector-specific frameworks), on-prem makes the controls auditable in ways hosted often doesn’t. The audit story is materially easier.

These are real categories. The deployments visible in public reporting that moved workloads from hosted to on-prem in 2025 are mostly addressing one of these. The deployments that haven’t moved either don’t have these workloads or are accumulating costs they’re not measuring.

Why the inertia is hard to break

A few reasons the hosted default persists past where it should:

The setup cost is front-loaded. On-prem requires investment, hardware, networking, operational discipline. Hosted requires a credit card. The cost-per-call comparison favors hosted for the first month and favors on-prem for the next several years; teams default to the short-term picture.

Vendor relationships are sticky. The team that’s already integrated with vendor X has integration overhead they don’t want to redo. Switching to on-prem feels like rebuilding rather than building; the inertia is real even when the math favors moving.

The skill set is rare. Operating local-AI infrastructure requires platform-engineering skills the team often doesn’t have in-house. Hiring is hard; training takes time; the gap is real.

The accumulated cost is invisible. The hosted bill arrives in fragmented pieces, line items across multiple vendors, attributed to multiple cost centers, observed by FinOps but not by the engineering teams making the decisions. The total only becomes visible at end-of-year reviews when the accumulated number is the conversation.

The risk is diffuse. No single hosted-AI dependency feels critical. The combine dependency is meaningful and shows up only when something goes wrong, vendor outage, pricing shift, capability change.

These all argue for the inertia. They don’t argue against the on-prem case being right; they argue for why the right case keeps getting deferred.

The realistic on-prem path

For a team with workloads that should be on-prem but are currently hosted, the path that actually works:

Pick one workload to migrate first. The highest-volume privacy-bound one is usually the right pick. Move it. Measure the result. Use the case study to justify the next migration.

Don’t try to replicate the hosted-vendor surface 1:1. The on-prem version doesn’t need every feature of the hosted version. It needs the features your workload actually uses. Building scope-creep in the migration is the most common failure mode.

Plan for the operational discipline. The cluster-of-one piece covers the operational practices that make on-prem sustainable. The teams that skip this step end up abandoning the on-prem investment within a year.

Keep hosted in the loop for the workloads that need it. On-prem isn’t all-or-nothing. The same team can credibly run on-prem for the privacy-bound and high-volume routine cases, and hosted for the frontier-capability and bursty cases. The hybrid is the practical answer.

Build the migration discipline into the team’s regular work. Not a one-off project. A ongoing capability the team has. The teams that treat on-prem as “a thing we did once” lose the muscle; the teams that treat it as ongoing platform work compound the advantage.

What this means for vendor relationships

The on-prem move changes the vendor conversation in useful ways:

Pricing leverage. Vendors price more aggressively for customers with credible alternatives. A team that’s run a workload on-prem can negotiate hosted pricing differently than a team that’s never moved one.

Capability assessment. Running the workload on-prem teaches you what the workload actually needs from the model. The hosted-vendor pitch then gets evaluated against what you know rather than what they’re selling.

Roadmap visibility. When you depend on a vendor’s hosted offering, their roadmap is your roadmap. When you can run on-prem alternatives, you have optionality on roadmap surprises.

These aren’t dramatic. They’re the cumulative effect of having credible alternatives, which is what the on-prem investment buys you.

What I’d recommend

For teams with workloads that fit the on-prem profile:

  • Audit your current AI spend by workload category. Identify the privacy-bound and high-volume routine ones.
  • Pick one to migrate. Start with the highest-leverage candidate.
  • Plan for the operational investment. Hardware, monitoring, backup, on-call rotation. Same as any other production system.
  • Run hybrid. Don’t try to move everything. The right pattern is per-workload routing.
  • Measure both sides. The hosted bill on the workloads that stayed; the all-in cost (capital + operating + risk) on the on-prem workloads. Re-evaluate annually.

The hosted default is correct for the workloads it’s correct for. For the workloads it isn’t, the on-prem case is structurally stronger and the inertia is the only reason most teams haven’t moved. Worth being deliberate about which side of the line each of your workloads sits on.

The teams that get this right have a better cost story, a better privacy story, a better vendor-leverage story, and a more durable AI infrastructure. The teams that don’t have a hosted bill that grows quietly and a vendor dependency that compounds.

Stop letting the AI vendors handle the data that doesn’t need to be theirs.