2025 wrap-up: what actually changed in AI
End-of-year recap. Most of the year's narrative was hype layered over a smaller set of substantive shifts. Worth being explicit about what actually changed and what didn't, before the 2026-predictions piece tries to extrapolate from the noise.
End-of-year recap. The pattern across 2025 was the same one most years follow, a lot of marketing-layer hype layered over a smaller set of substantive shifts. Worth being explicit about what actually changed before the 2026-predictions piece tries to extrapolate from the noise. The bullet points below are the things I'd point at as actually different from where we started the year, separating the durable from the loud.
What actually changed
The open-weights workhorse tier closed the gap. January started with closed-frontier shops owning the workhorse tier. December ends with credible open-weights options at every tier including the higher-end of workhorse. DeepSeek's V3-0324 in March, R1 distillations all year, Llama 4 in April, R2 in August, each moved the line. The cumulative effect is real.
Multi-vendor pricing pressure compressed. The hosted workhorse-tier price band moved down meaningfully. The open-weights pressure plus inter-vendor competition produced 30-50% effective price reduction on most workloads. The closed-frontier shops responded with capability moves (Claude 4, GPT-5) more than with price moves; the workhorse-tier price floor still moved.
MCP became real infrastructure. Started the year as an Anthropic-led standard; ended the year as the cross-vendor foundation for tool integration in serious AI deployments. The routing-layer category formed (MetaMCP and others). MCP-only architectures emerged as a viable production pattern.
The personal-AI foundation matured. Apple Silicon plus open-weights crossed the threshold for principled-personal-AI to be a real practitioner reality. The M4 Max Studio shipping in March was the punctuation; the cumulative tooling effect through the year is the substance.
Agent design patterns settled. Planner-executor, tool-scoped subagents, retry-with-reflection, human-in-the-loop checkpoints, bounded autonomy. These patterns moved from emerging to established. The agentic-IDE category specifically matured around plan-mode discipline.
The training-without-hyperscaler story arrived. Distributed fine-tuning on neoclouds plus Apple-Silicon training plus mature LoRA tooling moved meaningful AI training out of hyperscaler-only territory.
Distillation became a routine production pattern. The frontier-tier-teaches-workhorse-tier-student approach moved from research to production. Significant cost reductions on bounded high-volume workloads as a result.
The agent governance gap stayed open. Microsoft Build pitched it; Google I/O pitched it; the practitioner conversation about the missing governance layer stayed loud. The gap is the same shape at year-end as at year-start.
SOC 2 / audit frameworks started taking AI seriously. First year where every audit cycle included AI questions. The frameworks are still calibrating; the discipline is starting to form.
That's the substantive list. Each item is a real shift; the cumulative picture is meaningfully different from January.
What didn't change as much as the marketing suggested
Frontier-tier capability moved less than the keynote framings implied. Claude 4 and GPT-5 were real upgrades. They were upgrades, not step-changes. The headline framings overstate the per-release deltas; the actual capability frontier moves at a steady-but-not-dramatic pace.
Agentic-everything didn't actually arrive. The agent demos at Build and I/O made promises the production reality doesn't deliver on. Agents work for bounded use cases; the open-ended "agents do anything" demos are still mostly demos.
The browser-agent category narrowed. Operator, Computer Use, Mariner all shipped. The use cases that durably worked were narrower than the demos. The category exists; it's smaller than the marketing implied.
Apple Intelligence's headline features mostly didn't ship. WWDC 2025 promised them; year-end has them mostly still slipping. The foundation is right; the product reality is partial.
Multi-modal frontier didn't move dramatically. Image generation got better; video generation got somewhat better; cross-modal reasoning at the frontier improved less than expected. Mostly incremental.
Mass adoption of local-first AI didn't happen. The principled-user community grew; the broader population mostly stayed on hosted. The bridge product didn't materialize; the timing was longer than the optimistic framings.
These are the gaps between marketing and substance. Each one is worth being explicit about because the 2026-predictions piece needs to weight them appropriately.
What surprised me
A few things that didn't fit my prior models:
The pace of MCP adoption. I expected this to be slower; it ran faster. The standard-coordination problem solved more cleanly than I'd predicted.
The maturity of distillation as a production pattern. I underweighted this badly in March. The cost lift from teacher-student deployments has been outsized for the workloads that fit.
The price-pressure trajectory. Predicted to compress; did compress; faster than the optimistic side of my range.
The MLX training story. Went from "experimental" to "production-usable for fine-tunes" through the year. I called this as research-grade at year-end; wrong by a meaningful margin.
How much the agent governance conversation stayed unresolved. Predicted some narrowing of the gap; it didn't really happen. The practitioner conversation matured; the platform-level fixes mostly didn't.
What I'm carrying into 2026
The framing for the 2026-predictions piece needs to account for:
Open-source momentum compounds faster than my prior models suggest. I'll weight this more heavily.
The foundation-vs-product gap is the dominant pattern. Every category in personal AI / consumer AI is exhibiting it. The foundation matures faster than the products built on it.
Pricing pressure is a constant. The trajectory is intact. Plan against continued compression rather than stabilization.
Governance-and-discipline lags capability. Every quarter the capability ships before the framework to use it well does. The pattern persists; building the discipline early is the working strategy.
Multi-vendor reality is the architectural default. No single vendor wins. The lock-in mitigation work pays back; the single-vendor commitment increasingly looks like a worse architectural choice than the multi-vendor one.
These five takes will inform the 2026-predictions piece. Each is a recalibration from my March-2025 priors based on 2025 evidence.
The honest summary
2025 was a year of substantive foundation progress and uneven product delivery. The open-source side moved faster than expected; the consumer-product side moved slower. The pricing compressed; the capability frontier extended modestly; the patterns for working with AI matured.
The keynote conversation was loud. The substantive shifts were real but quieter. The teams that listened to the substantive shifts and ignored the keynote noise are in a better position for 2026 than the teams that got caught chasing each new vendor announcement.
The 2026-predictions piece comes next. Worth grounding it in this recap rather than in the marketing-layer narrative of where the year was supposed to go. The actual year produced a different shape than the predicted year would have; the predictions for next year should learn from the gap.
A real year. Substantive enough. The next year is where the 2025 shifts compound or revert. Worth watching closely.