The state of GPU access for the rest of us
The headlines say GPUs are unobtainable. The headlines are about the hyperscalers. For everyone else, the picture in early 2025 is more interesting and more usable than it gets credit for.
Whether GPUs are available right now depends entirely on which question you're asking. If the question is "can I sign a multi-year reservation for ten thousand H200s in a US-East region by Q3," the answer is mostly no, and the Stargate-style infrastructure announcements are aimed at fixing that. If the question is "can I get an H100 by the hour, today, on a usable cloud, at a price that lets me actually run a project," the answer is comfortably yes, and that answer has been comfortably yes for about a year now in a way the public conversation undersells.
Worth walking through what actually exists for small teams and individuals at the start of March, because the hyperscaler-tightness story has shaped people's mental model in ways that no longer match the available options.
The four tiers actually available
The market has settled into four meaningfully different shapes of GPU access, each with its own price points, lead times, and contract terms. Worth being explicit about which is which.
Hyperscaler reserved capacity
AWS, Azure, GCP. Multi-year contracts, dedicated capacity, regional availability negotiated, account-team relationship required. The H100/H200 generation in this tier is largely backlogged through the announcement period of the Stargate-class commitments, the available capacity goes to the customers with the largest existing footprint and the largest forward commitments. If you're not at that scale, the on-demand instance types you can spin up here exist but at prices that nobody actually pays without significant negotiated discount.
This tier is what the "GPUs are unobtainable" headlines are about. It is not where most useful AI work outside the largest customers actually happens.
Hyperscaler on-demand and short-reservation
Same providers, but the lower-commitment tiers, on-demand instances, one-year reservations, savings plans. Pricing is high relative to alternatives (especially Azure and AWS for the H100 generation), but the integration with the rest of your stack is clean if you're already a major customer of the platform. Useful for spiky workloads where the hyperscaler integration is worth the premium.
Neocloud rentals
CoreWeave, Lambda, Together, Modal, RunPod, Crusoe, Hyperstack, Vast.ai, Tensorwave, and a long tail of smaller providers. This is the tier that quietly grew up over 2023–24 and is now the default for most individual and small-team AI workloads. Hourly pricing on H100s is in the $2–$3 range depending on provider; A100 pricing under $2; consumer-card spot pricing on RunPod and Vast under $1 for many configurations.
The quality bar varies, some neocloud providers are operationally mature, others are essentially "we own a rack and rent it on the internet." For real production workloads you want to test the operational maturity before depending on it. For experimentation, training, fine-tuning, batch inference, the cheap end of the market is usable in a way that the hyperscaler tier isn't.
Consumer hardware you own
Self-hosted at home or in a closet, a rig with a 4090 or 5090, or a Mac Studio with enough unified memory. The economics tip toward owning when you use the GPU more than a few hundred hours a month and your workload fits in consumer memory. The home-self-hosting picture for early 2025 is its own piece, but it belongs on this map: for some workloads, the most cost-effective and lowest-friction GPU access is the one already running in your office.
What to use which tier for
A working heuristic for matching workload to tier:
- Production inference at scale: hyperscaler on-demand or neocloud, depending on integration needs and price sensitivity. If you're already on AWS for the rest of your stack and the integration friction is real, the AWS premium can be worth it. If you're doing a clean greenfield AI workload, neocloud usually wins on price.
- Training runs and large fine-tunes: neocloud, almost always. The training-job pattern of "spin up a big rig for a few hours or a few days, then release it" is exactly what the neocloud rental model is good at. Reserve a node for the duration; don't try to build the same job on hyperscaler on-demand pricing.
- Experimentation, hobby work, occasional batch jobs: cheap neocloud (RunPod, Vast.ai) or the hardware on your desk. The under-$1/hour consumer-card market is real and works fine for prototypes.
- Always-on personal-AI workloads: own the hardware. Hourly economics for a 24/7 workload don't work even at neocloud prices once you're past a few months.
The thing to avoid is treating hyperscaler on-demand as the default option for AI work because it's the default for everything else you do. The price-per-hour delta between neocloud and hyperscaler is large enough (sometimes 2–3×) that it changes which experiments are economically viable.
What's likely to shift over the next two quarters
A few specific things worth tracking:
- Blackwell-generation availability is going to land first at the hyperscalers and the largest neoclouds, with allocation pressure on the smaller ones for at least a quarter. If your workload genuinely needs B100/B200 specifically, expect to wait or pay premium.
- Spot pricing volatility at the consumer-card neoclouds will likely increase as more workloads chase the cheap end. RunPod and Vast.ai have already been seeing tighter spot availability for the popular configurations on weekday business hours.
- Hyperscaler price drops are coming on the older generations (A100, V100) as the pressure on H100/H200 capacity gets relieved by the next generation shipping. If your workload runs fine on A100s, the hyperscaler economics for that generation specifically will improve.
- Sovereign-cloud pressure will likely make European-region rentals more available and more constrained at the same time, more hosts, more residency rules, narrower options for any specific use case.
The big-picture story is that the GPU market for the rest of us is healthier than the headlines suggest. The headlines are mostly about the procurement experience of customers committing $50M+/year to compute. For everyone else, the question of "where do I get an H100 for the next six hours" has had a workable answer for over a year, and the menu of workable answers continues to grow.
If you're not getting AI work done because you can't get GPUs, the bottleneck is almost certainly somewhere else.