DeepSeek's V3 follow-up and what 30¢-per-million-tokens means

DeepSeek dropped a V3 update with sharper benchmarks and a price floor that makes the rest of the workhorse-tier market look expensive. The cost per million is interesting; the trajectory is the actual story.

Sid Smith

16 Apr 2025 • 4 min read

DeepSeek dropped a V3 update this week with sharper benchmarks and a price floor that makes the rest of the workhorse-tier market look expensive. $0.27 per million input tokens: $1.10 per million output. That's roughly an order of magnitude cheaper than what a workhorse-tier closed-frontier model cost six months ago, and it lands the same week GPT-4.1 priced down to $2/$8. The price compression in this category is no longer a debate.

The 30¢-per-million number is the headline. The trajectory underneath it is the actual story.

Workhorse-tier inference cost trajectory from late 2023 through April 2025 showing the closed-frontier line descending slowly while the open-weights line dropped dramatically with DeepSeek-R1 in January and again with V3-0324 in April.

What changed in the model

V3-0324 is an step-by-step refresh of the DeepSeek-V3 base, the same 671B-total / 37B-active mixture-of-experts architecture, retrained on a refreshed corpus with better instruction-tuning data and a few attention-mechanism tweaks. The benchmarks Meta-style top-line numbers are noticeably better than original V3, particularly on coding and tool-use evaluations. R1's reasoning capability sits on top of this base, so the refresh quietly improves the reasoning model too.

The license stays MIT. The weights are on Hugging Face. DeepSeek's hosted API takes the new pricing immediately. The third-party hosts (Together, Fireworks, Hyperbolic, etc.) take a few days to roll out the refreshed weights at competitive prices.

What's noteworthy isn't any individual capability number. It's that DeepSeek has now demonstrated they can ship step-by-step releases at this price point, on this cadence, with this benchmark trajectory. That's a different signal than "they got lucky with one model." The release cadence is itself the case for taking the pricing trajectory seriously.

What 30¢-per-million actually means in the market

A few specific things shift when the workhorse-tier price floor drops to this range:

Backend-heavy applications get dramatically cheaper. Anything that fans out a lot of model calls per user request (RAG with reranking, multi-step pipelines, batch enrichment) has its inference bill drop by roughly the price ratio. For an application that was spending $1,000/month on inference six months ago at GPT-4o pricing, the same workload moved to DeepSeek-V3 hosting comes in at roughly $100. That changes which features are worth building.

The "how cheap can it get" conversation has a real datapoint. Six months ago the bear case for AI startup economics included a worry that inference costs would stay too high to support consumer-facing freemium products. That worry was always partially answerable with "wait." It's now answerable with "wait less." Consumer-tier products that integrate AI at meaningful scale have a price floor that's compatible with ad-supported business models, which they didn't a year ago.

The closed-frontier shops are facing a price ceiling, not a price floor. GPT-4.1's $2/$8 pricing isn't competitive with $0.27/$1.10 on its merits, it's competitive on integration, latency, support, and the tooling of tools that surround it. That's defensible at some price premium. The premium just got bigger relative to the alternative, and the integration benefits have to do more work to justify the spend.

Migration cost becomes more interesting than per-token cost for many workloads. When the per-token price was the dominant cost, switching vendors paid back fast. Now the switching cost, re-prompting, re-validating, re-tooling agentic workflows for a new model's quirks, can dominate the per-token savings for non-trivial workloads. The sticky integrations become stickier, but for new builds the default foundation question is back open.

What the trajectory looks like

The thing the chart above is trying to capture: this isn't a one-time DeepSeek shock. It's the second of what will probably be at least one more this year. R1 in January knocked the floor down by an order of magnitude on reasoning. V3-0324 in April refreshed the workhorse tier to keep pace with the price compression that R1 triggered across the rest of the market. The next release on this cadence, if DeepSeek holds, is in three to four months and will continue to push the floor.

What that does to the planning math: any business model whose unit economics depend on inference costs being a meaningful percentage of revenue is going to need to either move with the price compression (which means the revenue side has to expand to absorb the cost reduction) or commit to integration-and-quality differentiation (which means accepting that the open-weights tier is going to keep eating the bottom of your market).

The training-cost shock from January was the more dramatic event because it broke a widely-held narrative about moats. The inference-cost compression V3-0324 represents is less dramatic but probably bigger long-term, training costs are paid once per model, inference costs are paid once per request, and request volume is the thing that keeps growing.

Where it doesn't apply

A few caveats worth being explicit about:

At very low volume, the price doesn't matter. If your application makes thousands of model calls a month, not millions, the per-call pricing is rounding error compared to the integration overhead. Pick the model with the best integration story for your stack and don't sweat the pricing arbitrage.
The reasoning premium is still real. R1 at $0.55/$2.19 is cheaper than o1 by roughly the same ratio as V3 vs GPT-4o, but reasoning models cost more than non-reasoning ones across all vendors. The cost advantage compounds at scale; it doesn't eliminate the tier difference.
Hosted DeepSeek has the censorship and residency caveats that any China-based model service comes with. For workloads where that matters (some EU regulators, some US enterprise procurement), the on-premise or third-party-host paths exist and carry their own price premium.
Reliability and SLA at the cheap-host tier vary. The $0.27 price isn't the price of frontier-vendor reliability; it's the price of the model itself. If your application can tolerate occasional rate limiting or has a fallback, the savings are real. If it needs four-nines availability, you're paying for a different product.

The 30¢ number is the marketing headline. The thing actually worth tracking is the slope of the line, and the slope is steeper than most people's twelve-month plans assumed.

What changed in the model

What 30¢-per-million actually means in the market

What the trajectory looks like

Where it doesn't apply

Subscribe to Echoes of the machine