AI in the News

AI in the news: week of February 22, 2026

Gemini 3.1 Pro and Sonnet 4.6 land inside a week, Alibaba slips Qwen3.5 out hours before Lunar New Year, the EU stretches its AI Act timeline, and Baker McKenzie cuts 600-1,000 attorneys citing AI. The cadence is compressed; white-collar is now in scope.

Sid Smith

22 Feb 2026 • 6 min read

What this week actually changed: the frontier-model release cadence compressed to near-monthly, the platform layer became the actual product (the model is the demo), and white-collar professional services entered the AI-cited displacement frame.

A heavy frontier-model week. Gemini 3.1 Pro and Sonnet 4.6 landed inside seven days of each other, Alibaba slipped Qwen3.5 out hours before the Lunar New Year break, and the labor numbers for February kept climbing. MWC Barcelona is the following week (March 2 to 5), so the device-and-network announcements that usually anchor late February are still ahead. This week is squarely about the model layer.

Two frontier releases inside a week

February 19, Google released Gemini 3.1 Pro in preview across the Gemini API, Vertex AI, the Gemini app, and NotebookLM. The model posted leading scores on 13 of 16 benchmarks Google chose to highlight, with the headline number a 77.1% on ARC-AGI-2, more than double Gemini 3 Pro's score on the same evaluation. Public preview in GitHub Copilot shipped the same day.

The benchmark gain is real, and the ARC-AGI-2 jump is the one worth flagging. ARC-AGI-2 is the harder of the two ARC suites, and a doubling on the harder track at the same parameter scale is the kind of result that should make you pause and ask whether something architectural changed or whether the eval has been over-fit. I don't have a confident answer yet. The model card is light on the ablations that would settle it.

What I'll watch over the next month: how Gemini 3.1 Pro performs on out-of-distribution agentic workflows that aren't in the benchmark set, and whether the deltas hold once developers run their own internal evals. Benchmark wins on launch day are a noisy signal. What people building with the model say four weeks in is the signal that matters. The pricing-and-availability part is more interesting than the benchmark part for the work I actually do, same weights running across consumer and enterprise endpoints, Gemini Enterprise plus the consumer app, all priced as the platform with the model as the demo.

Two days earlier on February 17, Anthropic released Claude Sonnet 4.6, pulling near-Opus performance into the Sonnet tier. The release notes call out improved coding and document comprehension, materially better computer use (browser navigation, form filling, software operation), and the 1M token context window that Opus 4.6 brought down to Sonnet pricing. Twelve days after Opus 4.6. Two model releases inside two weeks from one lab is faster than Anthropic ran in 2025, and it tracks the broader frontier pattern of shorter ship cycles to keep up with China-side acceleration.

For the work I actually do, Sonnet-tier 1M context is the meaningful change. Million-token context at Sonnet pricing makes whole-codebase analysis a default workflow rather than a careful budget call. The computer-use improvements are interesting, but I keep most of my agent work outside browser-driven flows so they affect me less. Coders running Claude Code will feel the upgrade most. The thing I'd push back on is the same thing I pushed back on with Sonnet 4.5 in October: the assumption that hosted Claude is the only architecture worth considering. The Sonnet capability is genuinely useful, and the API price-per-token is reasonable. But the on-prem case doesn't go away because the hosted models got better. The data that shouldn't leave your network still shouldn't leave your network. Hosted-model improvements are an argument for using hosted models on the workloads where the data tradeoff is acceptable, not for collapsing the whole architecture into one vendor.

Alibaba slips Qwen3.5 out before the holiday

February 16, Alibaba released Qwen3.5 hours before the Lunar New Year holiday started. The timing was deliberate, anchor the China AI conversation on Qwen before everyone went quiet for the week. The model is a native vision-language architecture with 397B total parameters (17B active per pass), 1M context, 201 languages supported, and 60% lower per-token pricing than Qwen2.5. Early tests show functional 3D game generation, browser and website synthesis, and medical imagery analysis.

The China-side cadence keeps tightening. ByteDance shipped Doubao 2.0 the same weekend, and DeepSeek's V4 landed earlier in the month. The pattern through Q1 is unmistakable: the open-weights and near-open-weights work out of China is releasing on a faster clock than the closed US frontier labs, with capability deltas that have narrowed to the point where the conversation is about workflow fit rather than benchmark gap.

The 60% price drop on Qwen3.5 is the part I want to flag. Token costs continuing to fall is the dominant economic story of this AI cycle, and the pricing pressure is coming as much from the China-side open-weights stack as from competition between hosted US labs. The cost curve makes the frugal-AI strategy more, not less, viable. Cheaper inference at the open-weights tier is exactly what shifts the build-vs-buy math for organizations that don't have hyperscaler budgets.

What I'm watching: whether Qwen3.5 weights or a derivative ship in a form that's runnable on the kind of hardware most teams actually have. The 17B-active mixture-of-experts shape is the part of the architecture that determines whether this is a model anyone outside hyperscaler-tier infrastructure can actually run.

The EU stretches the AI Act timeline

The EU's November 2025 Digital Omnibus proposal kept advancing through the legislative process this week. The proposal would delay application of certain high-risk AI requirements by up to 16 months and adjust some governance and exemption rules. The full AI Act is still scheduled to be applicable on August 2, 2026, with the high-risk slice contingent on the Omnibus outcome.

I think the underlying instinct in the Omnibus is right. The AI Act timeline as originally drafted assumed a regulatory infrastructure (standards, conformity-assessment bodies, notified-body capacity) that just doesn't exist yet at the scale the Act requires. Pushing the high-risk effective date until the conformity infrastructure is real is more honest than letting the deadline arrive without the machinery to enforce it. Honest delay beats theatrical compliance.

The criticism I'd take seriously is the one about regulatory drift. Every push of the timeline is a window for lobbying to weaken the obligations, and the GPAI obligations that already went live in August 2025 are doing the heavy lifting on frontier-model transparency. The risk is the high-risk slice gets stretched indefinitely while the GPAI rules carry the weight they weren't designed for. For the governance work itself, the Omnibus matters less than the underlying calendar discipline. Organizations building toward August 2026 should keep building toward August 2026. The Omnibus may move the high-risk date; it doesn't change the shape of what compliant looks like.

Baker McKenzie joins the AI-cited cuts list

Two new shapes this week: Baker McKenzie cut 600-1,000 attorneys and staff in a "shift towards AI," and Workday quietly cut under 1,000 (Programs.com). Block (3rd week running) and Salesforce (continuing) round out a February Challenger total of 4,680 AI-cited cuts. White-collar professional services entering the displacement frame is the new bit. See the longer piece for where I land.

A few smaller items worth flagging

DeepSeek V4 continues to test competitively against Sonnet and GPT-4o on coding benchmarks, with cost-per-token sitting well below the closed-frontier labs. The China-side pressure on US pricing is real and compounding.
Cohere Tiny Aya (3.35B, CC-BY-NC) shipped in February supporting 70+ languages and explicitly designed for laptop and edge deployment. Edge-runnable multilingual models are exactly the foundation the on-prem case needs.
Mistral Large 3 and Small 4 moved to Apache 2.0, a meaningful licensing shift from Mistral's earlier restrictive terms. Open weights from a frontier-tier European lab matter because the controllable foundation is what makes encoding-a-person and on-prem deployment work.
Chinese AI models pushing pro-China views, Axios reported on bias patterns in DeepSeek and Alibaba models last week. Worth tracking as the open-weights-from-China tier becomes more deployable in Western contexts.

What to watch next week

The platform layer is the actual product. Google is shipping Gemini 3.1 Pro across consumer, developer, and enterprise endpoints with the same weights and the platform pitch as the lead. Anthropic is doing the same with the Claude SDK and Claude Code. The model is the foundation; the platform is the lock-in. The principled-user response is to keep the agent stack runnable against local and open-weights models so the platform commitment stays optional.

The labor curve is steepening and white-collar is now in scope. Baker McKenzie's "shift towards AI" framing for a 600-1,000-person legal cut is the shape change. Plan for the realistic view; hope for the slower one.

Next Sunday: MWC Barcelona day-one announcements (the show opens March 2), the device-side AI story, and whatever DeepSeek ships if it lands in the post-Lunar-New-Year window.