GPUaaS in late 2025: who's left and what they cost
The GPU-as-a-service market consolidated through 2025. The crowded field of neoclouds is smaller than it was; the surviving providers are more differentiated. Worth a survey of who's still around and what they actually cost.
The GPU-as-a-service market in early 2025 was crowded, every other week brought a new neocloud announcement, with prices that kept moving down and quality that ranged from "production-grade" to "we own a rack and rent it out." By late 2025 the field has consolidated. The crowded survivors are smaller in number, more differentiated in offering, and more legible in pricing than they were nine months ago.
Worth a survey because the GPUaaS space changed in ways that affect the GPU-access calculation for shops that were planning against the early-year picture.
What changed through 2025
A few specific dynamics:
The cheap-end consolidated. A bunch of small "we have GPUs, rent them on the internet" providers exited or got acquired. The race-to-the-bottom that characterized early 2025 produced a small number of survivors with viable economics rather than a long tail of marginal operators.
The mid-tier got more differentiated. Mid-tier providers (Lambda, RunPod, CoreWeave, Vast.ai, the smaller-but-real names) carved out specific positions. Lambda leaned on operational maturity. CoreWeave on enterprise integration. RunPod on developer experience. Vast.ai on the bottom of the market. Each one is now better understood as "the X for Y use case" rather than as interchangeable.
The hyperscalers held position. AWS, Azure, GCP didn't lose ground to the neoclouds the way some predicted. They kept the workloads where their broader integration mattered; they ceded the workloads where it didn't. Net share movement was meaningful but didn't reshape the market.
Consumer-tier GPU rentals matured. The "rent a 4090 by the hour" market got more reliable. Vast.ai and a few others made this a real category. Useful for hobbyist and small-shop work that doesn't justify the higher tiers.
Pricing got more transparent. The early-2025 patchwork of pricing models simplified. Per-hour pricing for on-demand, monthly commitments for reserved capacity, spot pricing as a real option. The "you have to call sales" pattern is mostly gone.
The market is more legible than it was. The survivors are more durable. The pricing has stabilized.
The current cost space
Approximate per-hour pricing for an H100 (which remains the workhorse for serious training and inference workloads), as of late 2025:
- Hyperscalers (AWS p5, Azure ND H100, GCP A3): $7-12/hour on-demand: $4-6/hour reserved. The premium reflects the integration story.
- CoreWeave: $4-6/hour on-demand for H100. Premium-tier neocloud with enterprise polish.
- Lambda Labs: $3-4/hour for H100 on-demand. Solid mid-tier choice.
- RunPod: $2.50-3.50/hour for H100 community pods. Good developer experience for small-team work.
- Vast.ai marketplace: $1.50-3/hour depending on availability and operator quality. Cheapest legitimate option; quality varies.
- The various smaller providers: $1.50-2.50/hour with operational variability that makes them harder to recommend for production workloads.
For B200s and the higher tiers, scale the numbers up roughly 1.5-2× across providers.
For 4090-class consumer hardware: - Vast.ai marketplace: $0.30-0.70/hour - RunPod: $0.40-0.80/hour - Direct purchase amortized: ~$0.10/hour over a 3-year hardware lifecycle including power
That's the menu. The right pick depends on workload and operational tolerance.
Where each provider actually wins
The "who do I pick" question is workload-shape specific:
Production training runs at small-to-medium scale. Lambda or CoreWeave for the operational maturity. The price premium over Vast.ai is justified by reliability and support.
One-off experimentation. RunPod or Vast.ai for the speed of getting started and the cost. Spin up, run, tear down. Not for production.
Workloads tied to broader cloud integration. Hyperscalers, pay the premium because the integration story matters. AWS-on-Bedrock-plus-AWS-GPU is a coherent story; AWS-on-Bedrock-plus-RunPod-GPU is a mess.
Hobby work and learning. Vast.ai consumer-tier or local hardware. Cost is the binding constraint; operational quality matters less.
Multi-month committed workloads. Reserved capacity at any of the providers. The discount versus on-demand is meaningful (30-50%) and the commitment is workable for predictable workloads.
Bursty and unpredictable workloads. Spot pricing on hyperscalers (cheap when available, drops you when capacity is needed) or on-demand neocloud (more expensive than spot, more predictable).
These are the patterns. Most teams use 2-3 providers covering different workload shapes; the single-provider strategy is increasingly rare.
What the consolidation means for buyers
A few practical implications:
Less time on vendor selection. With fewer credible providers, the evaluation matrix is smaller. The decisions are easier and the stakes are lower because the survivors are mostly differentiated by shape rather than by quality.
More predictable economics. The pricing-volatility that characterized early 2025 is mostly gone. Budgeting against current pricing is more reliable than it was.
Easier multi-provider strategy. With the survivors being more differentiated, picking 2-3 to use for different workload types is more natural than it was when "they all do the same thing" was the perception.
Vendor leverage is more nuanced. When the market had 50 marginal providers, switching was easy and leverage was abundant. With fewer differentiated survivors, switching is more deliberate but the providers care more about retention.
The hyperscaler-vs-neocloud question is sharper. Pick hyperscaler when integration matters, neocloud when it doesn't. The middle ground (use hyperscaler GPUs because they're simpler) is harder to defend on cost alone now.
What to watch in 2026
Three dynamics to track:
The Blackwell rollout effect. B200s shipping in volume changes the per-FLOP pricing dynamics. The providers that capture the right mix of B200 inventory have an advantage; the ones still mostly on H100s slip on price-per-perf.
Inference-specialized providers. Some providers are specializing in inference rather than training. The economics of inference workloads are different, different optimization targets, different commitment patterns. Watch for providers focused specifically on this.
Consumer-tier price compression. The 4090 / 5090 / consumer-class markets keep getting cheaper. The threshold at which "rent a consumer GPU" beats "buy and amortize" continues to move; the threshold at which "use hosted API" beats "rent a consumer GPU" similarly moves.
Geographic specialization. Providers focused on specific regions (EU data residency, Asian markets, US-only with US-only data flows). The regulatory pressure is growing; the providers that lean into this have a durable position.
What I'd recommend
For teams making GPU procurement decisions in late 2025:
- Pick 2-3 providers, not 1. Different workloads, different providers. The single-provider strategy worked when the market was less differentiated; the multi-provider strategy works better now.
- Use hyperscaler when integration matters; neocloud when it doesn't. The decision is workload-shape, not "neoclouds are always cheaper."
- Don't optimize for the cheapest hour. Operational reliability matters; the cheapest provider is often false economy at production scale.
- Treat spot pricing as a real option for the bursty workloads but don't depend on it for production-critical paths.
- Re-evaluate annually. The market is still moving fast enough that yesterday's optimal mix isn't today's.
The GPUaaS market in late 2025 is more grown-up than it was nine months ago. Fewer providers, better differentiation, more predictable economics. The decisions are easier; the trade-offs are more legible. Worth being deliberate about the picks because the multi-year commitment they represent is more durable than the spot-rental framing suggests.