DeepSeek R2: open frontier, no asterisks

DeepSeek shipped R2 with open weights, MIT-licensed, frontier-competitive on the benchmarks that matter, and at a price floor that puts more downward pressure on closed-frontier pricing than anything since R1 in January. The asterisks are gone.

Sid Smith

13 Aug 2025 • 5 min read

DeepSeek shipped R2 yesterday. MIT license, fully permissive, weights on Hugging Face within hours of the announcement. Benchmarks competitive with the closed-frontier reasoning tier on the categories that matter, math, multi-step logic, code reasoning. Hosted pricing through the major neoclouds settled overnight at roughly $0.50 / $2.20 per million tokens for the reasoning-on configuration, which is workhorse-tier money for premium-tier capability.

The closed-frontier pricing pressure that started with R1 in January and accelerated with V3-0324 in March just moved again. R2 is the first open-weights release I'd put on the leaderboard with no asterisks.

What R2 actually is

The technical shape, in plain terms:

Mixture-of-experts, similar architectural footprint to V3 but with the reasoning-tier post-training that R1 introduced. Total parameters in the same 600B-class range; active parameters per token in the 30-40B range, which keeps inference fast.
Reasoning-on by default with the same <think> token convention as R1. Reasoning tokens are visible in the API; you can budget them or suppress them.
Context window 128K, same as V3.
MIT license, no restrictions on commercial use, no carveouts. The same fully-permissive shape DeepSeek has been shipping since R1.
Available immediately on the major hosted-inference providers (Fireworks, Together, OpenRouter, the usual suspects) at the prices above. Self-hosted on the inference stacks that already supported V3, with minor changes.

That's the basics. The capability story is the more interesting part.

What it can actually do

A day of testing on real workloads, not synthetic benchmarks:

Hard reasoning tasks. Closer to Claude Opus 4 than I'd have predicted a quarter ago. Multi-step math problems, formal-logic puzzles, the kind of structured-thinking work where the closed-frontier reasoning tier owned the territory. R2 keeps up. Not always; on the hardest cases Opus 4 still wins; on the moderate-hard cases R2 is competitive enough that the price-per-capability calculation favors it heavily.

Code reasoning. Comparable to GPT-5 / Sonnet 4 on the multi-file refactor cases I tested. Slightly behind on the very largest contexts where Sonnet 4's coherence-at-length advantage shows. Slightly ahead on the cases where you want explicit step-by-step reasoning visible.

Long-form analytical writing. Fine. Not best-in-class but not embarrassingly off the workhorse tier either. The reasoning-on output has a structured-thinking quality that some readers will like and some will find verbose.

Tool-use sequencing. Mature enough for production agentic loops. Not as polished as the closed-frontier shops on the edge cases; competent enough on the common patterns.

The honest summary: R2 isn't dominant on any one axis. It's competitive across most axes at a fraction of the closed-frontier price. That's the value proposition that compounds.

The mid-2025 leaderboard I posted last month needs an update already. R2 changes two cells:

The premium-reasoning tier gains a credible open-weights competitor. Opus 4 and the o-series still own the very hardest reasoning. R2 sits below them on the absolute capability axis and well below them on price. For most reasoning workloads where the marginal capability difference doesn't pay back, R2 becomes the right routing choice.

The open-weights tier moves up. R2 plus V3-0324 plus the Llama 4 line plus the Qwen variants is now a tier where the workloads that fit are increasingly the workhorse-tier workloads, not just the cheap-batch ones. The cases for using closed-frontier workhorse-tier models (Sonnet 4, GPT-5, Gemini 2.5 Pro) narrow, they're still the right pick for stack-fit reasons, less the right pick for capability-only reasons.

The closed-frontier shops will continue to ship. The premium tier they own will continue to be valuable. The workhorse tier they've been pricing at $2-3/$10-15 is going to face more pricing pressure than the prior R1 → V3 cycle generated, because R2 hits the workhorse-tier capability bar from below while sitting at one-quarter to one-third the price.

What changes in routing

Specific routing decisions I'd revisit as of this week:

Hard-reasoning batch work. Workloads I'd been routing to Opus 4 for the planner step in agentic loops can credibly route to R2 for cases where the marginal capability matters less. The Opus-tier escalation budget gets smaller; R2 covers more of the cases.

Cost-sensitive workhorse work. Workloads that were on Sonnet 4 or GPT-5 for moderate-complexity reasoning can route to R2 with a meaningful cost reduction. The capability gap is narrow enough that the cost savings outweigh the capability difference for many cases.

Local-self-hosted reasoning workloads. R2 fits in 64 GB unified at 4-bit MoE, with the same caveats as V3 about the active-parameter count vs the total memory footprint. For shops running serious local setups, R2 joins the local-routable list, which extends the privacy-bound use cases meaningfully.

Closed-frontier premium tier for the hardest cases. Stays. Opus 4 and the o-series still earn their place on the workloads where the absolute capability ceiling matters and the cost is rounding error. The escalation criteria get sharper because R2 covers more of the previously-Opus territory.

The pricing-pressure narrative compounds

This is the third meaningful price-floor move from DeepSeek in nine months. R1 set the floor at the reasoning-distilled-32B level. V3-0324 set the floor at the workhorse-MoE level. R2 sets the floor at the reasoning-MoE level. Each move pulled the closed-frontier shops downward on price; each move expanded the workloads where the open-weights answer is the right one.

The closed-frontier shops can't easily match this on their workhorse tiers without margin compression they probably can't afford. Their playbook over the next two quarters is to widen the premium-tier gap (Opus 4.5? Opus 5? An o5-mini that's meaningfully better than the workhorse?) and let the workhorse tier face the pricing pressure. Whether that works depends on how fast the open-weights premium tier catches up, which based on the R1 → R2 cycle, is faster than the closed-frontier shops can shrug off.

What I'd watch over the next two quarters

Two specific things:

Whether the closed-frontier shops respond with a workhorse-tier price cut. The pressure is real. A response would be visible in pricing pages within 30-60 days; if it doesn't come, the workhorse-tier market continues to fragment with R2-class competitors taking share.

Whether DeepSeek or one of the other open-weights shops ships a premium-tier-capable open release. R2 is workhorse-with-reasoning. The next milestone is open-weights premium-tier, capability that competes with Opus 4 head-to-head on the hardest tasks. The R1 → R2 trajectory suggests this is months, not years, away.

The "open frontier" framing was speculative when R1 launched. It was pointed when V3-0324 dropped. With R2 it's a fact about the current state. The asterisks are gone. The pricing pressure is real. The routing decisions are different starting this week. Worth being deliberate about which routes need to change and which don't.