Google I/O 2025: Gemini 2.5 Pro and the agent push

Google's I/O was Gemini 2.5 Pro getting a clean coming-out and the agent story finally being told with a straight face. The model is real. The agent story has more work to do than the keynote suggested.

Sid Smith

21 May 2025 • 5 min read

Google I/O wrapped two days after Microsoft Build, and the symmetry was hard to miss. Both keynotes were the agents-everywhere pitch. Both shipped meaningful platform updates. Both have the same governance gap. The Google version is Gemini 2.5 Pro hitting GA after months of "experimental" labeling, plus Project Mariner getting a more ambitious framing, plus AI Mode in Search going wider, plus deeper Workspace integration. The model is real. The agent story has more work to do than the keynote suggested.

Worth being concrete about what shipped, how it stacks against the alternatives, and what it actually changes for shops that need to make architecture decisions in the next quarter.

Gemini 2.5 Pro to GA

The headline is Gemini 2.5 Pro graduating from "experimental" to GA, with pricing that lands at $1.25/$10 per million tokens (input/output). That's competitive with the workhorse tier, cheaper than Claude 3.7 Sonnet, more expensive than GPT-4.1, and the model itself benchmarks well on reasoning and on long-context tasks. The claimed 1M-token effective context is real for retrieval-style workloads; deep multi-step reasoning across the full window degrades the same way it does on every model that claims this kind of context length, but the claim has more substance behind it than most.

Three things that matter beyond the benchmark numbers:

The reasoning quality is genuinely strong. On tasks where I've directly compared 2.5 Pro against Claude 3.7 with extended thinking on, the answers are closer than the benchmark deltas would suggest. For coding and analytical workloads it's a credible top-tier choice now, not just a price-competitive one.

The tool-use behavior is mature. The function-calling surface has been refined enough that agentic loops work cleanly. This was the weakest part of the Gemini story six months ago; it isn't anymore.

Deep Workspace integration is the differentiator. Google's bet is that the model that lives natively inside Docs, Sheets, Gmail, and Drive (with first-class access to user content) beats the model that has to reach the same content through APIs. That bet hasn't fully paid off yet because the user-experience polish on the integrations is still uneven, but the structural advantage is real.

Project Mariner and the browser-agent push

Mariner is Google's browser-controlling agent. The I/O update repositioned it as a more general computer-use surface, with the browser as the primary foundation but with clear ambitions to extend into other applications. The pitch is "agents that can do tasks in the browser the way a person would", fill forms, navigate sites, complete purchases, follow workflows.

The category is contested. OpenAI Operator, Anthropic Computer Use, and now Mariner are three browser-agent products with overlapping ambitions and meaningfully different shapes. Mariner's specific bet is that Google's existing accounts/identity/payments infrastructure (Google account, Google Pay, Chrome's autofill state) gives it a better "the agent is acting on the user's behalf" story than the alternatives.

That's structurally true. Whether it materializes as a category-leading product depends on execution and on consumer adoption. Both are open questions. The shape Google is iterating toward is reasonable; the timeline for it being durably useful for non-trivial tasks is probably longer than the keynote framing suggested.

AI Mode in Search

The most consequential change for the actual web, depending on how you measure: AI Mode in Search rolled out broadly with a more aggressive answer-generation surface. The traditional ten-blue-links experience now sits below an AI-generated answer for a meaningful fraction of queries, with the source links presented as citations rather than as primary results.

This has been moving in this direction for a while. The I/O update accelerates it. The shape that's emerging is: most informational queries get an answer-and-citations response from Search; only the more navigational or transactional queries return the traditional results. That's a real change for the web ecosystem, sites built around organic search traffic see their click-through rates drop because the answer is on the SERP itself.

I'm not going to relitigate the publisher-vs-search-engine economics here. The relevant point for AI architecture: this is a useful demonstration of what the agent-default consumer interface looks like, deployed at the scale of Google Search. The traffic it's redirecting is a pretty clean preview of the same dynamic that's going to play out across other consumer surfaces over the next year.

How this stacks against Microsoft's pitch

Microsoft Build the previous week was the enterprise agents pitch. Google I/O is the consumer agents pitch with an enterprise-Workspace footnote. The two are running parallel plays into different parts of the same market.

Where Google has the advantage:

Consumer reach. Search, Chrome, Android, Workspace all have direct consumer presence in a way Microsoft's stack doesn't.
Identity and payments. Google's existing user-identity infrastructure is closer to what an agentic consumer experience needs than Microsoft's.
Model capability. Gemini 2.5 Pro is competitive with Claude 3.7 and GPT-4.1 in a way the prior Gemini line wasn't.

Where Microsoft has the advantage:

Enterprise distribution. Microsoft 365 is in essentially every enterprise; Workspace's enterprise share is meaningful but smaller.
The Copilot brand, for the enterprise persona. Copilot is the AI surface they already pay for.
MCP integration. Microsoft has been more aggressive about adopting MCP as the cross-vendor protocol; Google's tool-use integration is mostly Gemini-shaped.

Neither has the governance answer. Both are pitching agents-everywhere with the same structural gaps in audit, policy, and revocation that I wrote about in the Build context. The gap shows up the same way at Google scale.

The frontier model menu I sketched at the end of February has shifted enough in two months that an update is overdue. The picture as of late May:

GPT-4.5, still premium positioning, low volume. Less compelling now that Claude 4 is rumored to land within days.
GPT-4.1, the workhorse, still genuinely competitive on price and capability. Microsoft's effective default.
Claude 3.7 Sonnet, mature hybrid-reasoning model, still strong on coding, on the cusp of being superseded by Claude 4.
Gemini 2.5 Pro, credible top-tier choice now. Best long-context story. Strongest if you live in Workspace.
Open-weights tier. Llama 4 Scout/Maverick, DeepSeek V3-0324, Qwen variants. Genuine commodity-tier alternative for cost-sensitive workloads.
OpenAI o-series + o3/o4-mini, reasoning premium tier, optimized for hard problems where the marginal capability is worth the cost.

The decision shape continues to be "match workload to model strength," not "pick the best." The picks just got better on average across the board.

What's worth tracking from the I/O announcements

Two things to actually watch over the coming months:

Whether AI Mode in Search keeps accelerating. The bet Google is making, that the consumer search experience converges toward "AI answers with citations rather than links", has implications well beyond the SERP. If it holds, the publisher economics conversation gets uglier; if Google pulls back, the SERP stays mostly traditional. The next quarter or two settle this.

Whether Mariner becomes the default consumer browser-agent surface. It's not the only product in the category; it has structural advantages in identity and payments; the execution will determine whether it lands. The Operator vs Computer Use vs Mariner three-way competition is likely the most-watched browser-agent race of the year.

The I/O announcements don't change the architecture decisions for the next quarter. Gemini 2.5 Pro joining the workhorse-tier alternatives makes that decision marginally easier, but it's a marginal addition to a market that already had credible options. The longer story Google is telling is more interesting: a vertically integrated AI experience that runs from the model through the OS through the productivity apps through Search itself. Whether that strategic position translates into category leadership is the next year's question. The I/O keynote made the case that Google thinks it can.