GPT-4.1: the model Microsoft Copilot really runs on

GPT-4.1 quietly shipped this week and it's the most interesting OpenAI release of the year. Not because it's the most capable model, because it's the model the business actually runs on.

Modern computer chip on dark surface beside a small price tag tied with twine

GPT-4.1 shipped quietly on Monday. API-only, three sizes (4.1, 4.1 mini, 4.1 nano), 1M-token context, and pricing that tells the entire story: $2/$8 per million for the flagship: $0.40/$1.60 for mini: $0.10/$0.40 for nano. Compare that to GPT-4.5's $75/$150 from late February, the same tier from the same company released about six weeks ago, and the gap is dramatic enough that the obvious question is what these two models are actually for.

The honest answer, which OpenAI's product positioning is dancing around: 4.5 is the headline model and 4.1 is the workhorse. The split between those two roles (the prestige tier and the actual tier) is what makes 4.1 the more important release.

What's actually different about 4.1

Beyond the price, three things stand out:

It's API-only. Not in ChatGPT, not in the consumer-facing tier. OpenAI is explicitly aiming this model at developers building applications, with the implication that the consumer-facing surface keeps a different (more expensive, more polished) model behind the scenes. The product split is real and is being communicated by where the model is allowed to show up.

It's noticeably better at coding than 4o was. OpenAI's own benchmarks claim something like a 25-point jump on coding evaluations, and the early hands-on reports match that. For application code generation, the model that lands on a given prompt feels closer to Claude 3.5 Sonnet's capability tier than 4o ever did. Whether that holds across more workloads is a separate question, but the coding lift is real.

1M token context, with caveats. The "long context that's actually usable" story has been creeping forward across vendors. 4.1 claims 1M effective context (which would be the largest among the closed-frontier models if true) though as with Gemini 2.0 Flash's 1M context, the practical experience with very long inputs depends heavily on what you're asking the model to do. Lookup-style retrieval at 800k holds up; deep multi-step reasoning across the full window degrades the way it does for every model that claims very long context.

What the price actually says

The most substantive thing about the release isn't the model itself. It's what OpenAI is willing to charge for it.

GPT-4o launched at $2.50/$10. GPT-4.1 launches at $2/$8, slightly cheaper than 4o, with better capability. That's a real price drop on the workhorse-tier model, and it lands the same week Llama 4 dropped its open-weights MoE family at competitive capability levels. Read together, the pressure on the workhorse-tier closed-frontier price is now coming from two directions: from the open-weights models that are competitive at the workhorse level (DeepSeek V3, Llama 4 Scout/Maverick), and from cross-vendor competition (Claude 3.5 and 3.7 Sonnet at $3/$15, Gemini 2.0 Flash at $0.10/$0.40 for the cheap tier).

OpenAI's response with 4.1 is essentially "we'll match the price compression on the workhorse tier and protect the prestige tier (4.5, the reasoning models) at premium pricing." That's a defensible business strategy and it's the cleanest read of why these two models exist alongside each other in the same catalog at radically different price points.

The Microsoft Copilot signal

The most-covered piece of the release wasn't on OpenAI's launch slides. It was the reporting from a few outlets that Microsoft Copilot had quietly switched its underlying model to GPT-4.1 in the days before the official announcement, and that the model OpenAI has been calling 4o-2025-04 in some recent docs may have always been a 4.1 variant under a different name. That's the kind of detail that doesn't show up on the product page but that matters for understanding the actual deployment story.

If accurate, it tells you that the model behind one of the largest consumer AI deployments is the cheaper-faster workhorse, not the prestige tier. That's a useful sanity check on what the field is actually using when it's spending its own money on inference at scale rather than positioning for press coverage. The prestige tier sells the brand; the workhorse tier carries the volume.

Where this leaves the menu

A reasonable updated read on the closed-frontier menu as of mid-April:

  • GPT-4.5, premium positioning, low volume, justifiable for narrow high-stakes workloads where inference cost is rounding error.
  • GPT-4.1, workhorse tier, real volume, the model anything cost-sensitive and OpenAI-aligned should now default to.
  • GPT-4.1 mini / nano, the bottom of the OpenAI catalog, competing directly with Gemini Flash and the cheap open-weights options for the long tail of inference work.
  • OpenAI o-series, the reasoning premium tier, optimized for multi-step problems where the marginal capability is worth the price delta.

The frontier model menu I sketched at the end of February needs updating. The workhorse-tier options are now genuinely competitive across vendors at roughly comparable price points, and the architectural decision an integrator faces is less about which model is best (mostly a wash within the workhorse tier) and more about which integration story matches the rest of the stack. That's the same kind of decision the cloud-vendor selection became a decade ago, the products converged and the contract terms started to matter more than the technical specs.

The bigger takeaway is what the release implies about pricing trajectory through the rest of the year. If OpenAI is cutting workhorse-tier prices in response to open-weights pressure in April, the same dynamic almost certainly continues. By Q3 the workhorse-tier price floor is probably another step lower, and the bundled-into-product cost of running a real AI feature is going to be substantially less than the back-of-envelope math from six months ago suggests. That's not the same as the bear case being right; it's just the reality of what happens when there are five plausible vendors and price competition has finally arrived.