Cloud

Three things AWS Bedrock still gets wrong

Bedrock has gotten meaningfully better in the last six months. The places it hasn't are still the same places. Worth being explicit about which gaps are likely to close and which look structural.

Sid Smith

23 Apr 2025 • 5 min read

I wrote in March about Bedrock through a small-shop lens and the upshot was that it's a defensible choice for shops already on AWS, less so for greenfield workloads where the AWS perimeter doesn't matter. Six weeks later, a couple of those rough edges are still rough, and they're worth being specific about because two of the three are likely to stay rough for the foreseeable future.

This is the "I've spent more time with it" follow-up. Bedrock has gotten meaningfully better in the last quarter. The places it hasn't are mostly the same places.

1. Model latency is structurally worse than going direct

Going direct to Anthropic's API for Claude 3.7, you typically see first-token latency in the 400–700ms range for a moderate prompt. The same prompt to the same model through Bedrock in the same region is consistently 200–400ms slower at the first token. Across a request, the time-to-completion difference is in the same range. Bedrock is somewhere between 30% and 60% slower for typical workloads.

The cause isn't mysterious. AWS routes the request through their own request-handling layer (auth, IAM, throttling, CloudTrail), then forwards to the model provider's hosted endpoint. There's a network hop's worth of cost on every request. For batch workloads where you're not optimizing for latency, this doesn't matter. For interactive workloads (chat UIs, IDE completions, anything where the user is watching the spinner) the additional 200ms is real and visible.

This is structural. AWS's whole value proposition is the request-handling layer. Removing it would defeat the point. The latency tax is the price of the integration story, and it's a price that doesn't go to zero.

The mitigation is workload selection: batch and async work belongs on Bedrock; latency-critical interactive work belongs going direct to the model vendor with the AWS perimeter handled at a different layer. That split is awkward but it's the honest answer.

2. Regional model availability is still spotty and unevenly documented

The headline coverage on Bedrock is "Anthropic's models are available." The actual coverage is "in some regions, on some accounts, with some delay between when the model vendor releases a new version and when AWS makes it available."

Specific examples from the last quarter:

Claude 3.7 Sonnet landed on Bedrock about three weeks after it landed on Anthropic's direct API. By the time it was on Bedrock, OpenAI was already shipping competitive models elsewhere.
Claude 3.7 with extended thinking is available on Bedrock but not in every region the base model is available in. The us-east-1/us-west-2 coverage is solid; the eu-west and ap-northeast coverage is partial.
Llama 4 Scout landed on Bedrock within a few days, but Maverick took longer and Behemoth (when it ships) probably won't be on Bedrock for some time given its size.

The result is that "we standardized on Bedrock for all our model access" tends to mean "we're a few weeks behind whatever the model vendors have released." For shops where staying on the latest model matters less than the integration story, this is a fine trade. For shops where the latest capability is competitive advantage, the lag is a real cost.

The documentation problem makes it worse: model availability per region is published, but it's published at a granularity that misses important details (which model variants, which feature flags, which throughput tiers). The honest workflow is "try to provision it; deal with the failure mode if the region/model combination isn't available." That's not how production procurement should work, but it is how it currently works for Bedrock.

This one might improve. AWS has been investing in faster model on-boarding, and the gap from Anthropic-direct to Bedrock has narrowed for the most-recent releases. It's not gone, and the regional unevenness probably stays.

3. The Agents abstraction is the wrong layer

I covered this briefly in the earlier piece; six weeks of additional context have made it more obvious that Bedrock Agents is the wrong shape of product for where the field is going.

The Bedrock Agents pitch is that you describe an agent declaratively (name, instructions, tool descriptions, knowledge base bindings) and AWS orchestrates the model loop, tool calls, and result handling. That's a coherent abstraction for a specific class of workflow. It's the wrong abstraction for the workflows that 2025 has actually settled on.

The problem: the field has converged on MCP as the protocol for tool exposure to AI clients. Every major IDE supports MCP. Every major model API supports MCP-style tool use. The pattern is "your tools live in MCP servers; your AI client surfaces them to the model; the orchestration is at the client layer."

Bedrock Agents is a parallel orchestration model that doesn't natively speak MCP. To wire an MCP server into a Bedrock Agent, you write an adapter. To wire a Bedrock Agent's tools into an external AI client (Claude Code, Cursor, etc.), you write a different adapter. The abstraction is creating its own vendor-lock-in problem at exactly the moment the rest of the field is standardizing on a protocol that would have prevented the lock-in.

The structural fix would be: Bedrock Agents becomes a thin Bedrock-side wrapper around MCP. Tools you build for one work for the other; orchestration patterns you learn for one transfer. AWS hasn't signaled they're going that direction. The Bedrock Agents product has been getting features added, not absorbed into a more open architecture.

For shops building AI workflows in 2025, the cleaner pattern is: tools as MCP servers, orchestration in your AI client, Bedrock as a model provider. That uses Bedrock for what it's actually good at (the model call surface, the IAM/billing integration, the residency story) and skips the parts that are the wrong shape.

What's worth being clear about

Bedrock is a serious product with real value for the shops it's built for. The criticisms above don't change that. They're the things to know if you're trying to plan around Bedrock for the next twelve months and want to know which rough edges are likely to file down and which are structural.

The latency tax is structural. Plan around it.

The regional/model lag is improving but not gone. Plan around it.

The Agents abstraction is at the wrong layer. Plan around it, meaning, build your agent workflows on the open protocol that the rest of the field is using and treat Bedrock as a model surface, not as an agent platform.

The shape of "Bedrock as part of a larger AI architecture" works in 2025. The shape of "Bedrock as the AI architecture" mostly doesn't, and the gap is in places that aren't going to close in the near future. The Stargate-style infrastructure conversation is partly about this same dynamic at a much larger scale, the question of whether the integration platform or the model provider owns the value chain. AWS's bet with Bedrock is the integration platform. The market hasn't fully agreed.

1. Model latency is structurally worse than going direct

2. Regional model availability is still spotty and unevenly documented

3. The Agents abstraction is the wrong layer

What's worth being clear about

Subscribe to Echoes of the machine