Why a $5M training run rattled three trillion in market cap

The DeepSeek-R1 number is small. The market reaction wasn't. The interesting bit isn't the dollar figure, it's which assumption it broke.

Sid Smith

05 Feb 2025 • 4 min read

The DeepSeek-R1 paper reports a training cost of roughly $5.6M for the V3 base run, which is the foundation R1's reasoning was layered on top of. That number has been argued about all week. Some of the arguments are reasonable, most are motivated, none of them really explain why the market reacted the way it did.

The interesting question isn't whether the $5M figure is precisely right. The interesting question is what assumption a number in that ballpark breaks, and why breaking that assumption knocked roughly $1T off NVIDIA, Broadcom, TSMC, and the rest of the AI infrastructure trade in a single Monday session.

The number, properly bounded

The $5.6M is the GPU-time cost for the final V3 training run, computed at $2/hr per H800. It does not include the salaries, the failed experiments, the inference-time compute that produced the synthetic data the model was distilled from, the DeepSeek-V2 lineage that V3 builds on, or any of the capex DeepSeek's parent fund had already paid for. A more honest "fully-loaded" number, including the compute used to generate training data, the prior-model lineage, and the research overhead, is probably an order of magnitude higher. Maybe two.

The pushback that "the real cost was $500M when you count everything" is not wrong. But it's not the right comparison either. The right comparison is to the equivalent fully-loaded number for GPT-4 or Claude 3 Opus or Gemini Ultra. Those numbers are not public. The estimates that have leaked or been guessed put them in the high single-digit billions for the largest labs across the program lifetime, with single training runs in the hundreds of millions.

Two orders of magnitude is still two orders of magnitude. Even on the most charitable comparison, V3/R1 trained on a budget that's a fraction of what the frontier labs are spending on equivalent capability.

The assumption that broke

For about two and a half years, the implicit market consensus has been that frontier model training was a capital-intensity business. The argument went like this: training a frontier model costs hundreds of millions. Inference at any reasonable scale costs hundreds of millions more per year. To play in this market you need access to tens of thousands of H100s, which means a relationship with NVIDIA, which means you're either a hyperscaler or you're funded by one.

That argument is the basis for the AI infrastructure trade. NVIDIA's earnings multiples, the Stargate-style "we'll spend our way to the frontier" announcements, the OpenAI–Microsoft and Anthropic–AWS coupling, all of it is downstream of the assumption that frontier capability requires frontier capex.

DeepSeek didn't disprove the inference half. Inference still scales linearly with usage, and usage is the thing that's actually growing. NVIDIA is still selling the picks and shovels for the gold rush that's actually happening. That's why the stock half-recovered by Friday, once the reasoned takes started landing, the people doing real diligence noticed that the inference part of the thesis was still intact.

But DeepSeek did break the training half. Not by proving you can train a frontier model for $5M, that's the strawman. By proving you can train a competitive frontier reasoning model on roughly an order of magnitude less compute than the frontier labs assumed was structurally necessary. The training half of the moat narrative is now disproven in public, with code and weights you can verify yourself.

What disproof actually looks like

Training cost trajectory, assumed industry consensus rising exponentially through 2024 versus the observed DeepSeek-V3/R1 datapoint sitting an order of magnitude below the projected 2025 frontier cost.

The slope of "training cost over time" was assumed to be exponential, every doubling of capability needed another doubling of compute, and frontier capability was therefore the kind of thing you could only afford if you owned the factory. The DeepSeek datapoint suggests the slope is much shallower than that, at least for the reasoning-model layer. A new training recipe, group-relative policy optimization, distillation from the V3 base, careful synthetic-data construction, bought a roughly 10× efficiency improvement over the assumed-frontier approach.

That doesn't mean training is now cheap. It means the frontier of training-cost is not where the market thought it was. And in a capital-allocation market, where the assumed frontier sets the price, that matters.

What the $1T move was actually pricing

The Monday selloff was the market re-pricing the assumption that training was a capital-intensity moat. Some of that re-price was rational. Some was the same kind of overshoot that happens whenever a long-held narrative breaks publicly. By Friday, after the analyses had landed and people had noticed that inference is still the actual workload, the prices had partially recovered.

What didn't recover is the assumption itself. Going forward, every conversation about AI capex now has a counterexample sitting on the table. "But you need $500M to train a frontier reasoning model" no longer ends the conversation. It opens it.

That's the real cost of the $5M number, not the dollar figure, but the rhetorical position it removed from play. The week-one reactions mostly missed this in the heat of the price action. The thing that broke wasn't NVIDIA's business. The thing that broke was a belief about what the floor of capability was going to cost going forward. Floors moving down are usually how a market gets bigger, not smaller, but it does redistribute who captures the value.

The number, properly bounded

The assumption that broke

What disproof actually looks like

What the $1T move was actually pricing

Subscribe to Echoes of the machine