Cost-modeling AI workloads with FinOps eyes
The per-token price is the easy line. Everything else, the retries, the context overhead, the agentic tool calls, the egress, the GPU reservation underneath the API, is where the actual bill comes from.
Exploring the echoes reverberating through time left by the technology of yesterday as we embrace the technology of tomorrow.
The per-token price is the easy line. Everything else, the retries, the context overhead, the agentic tool calls, the egress, the GPU reservation underneath the API, is where the actual bill comes from.
The case for running AI locally is louder than the math justifies for most workloads. Worth being explicit about which workloads it actually wins, and which ones the cloud still owns.
The job of crafting clever prompts to coax better answers from a frontier model is mostly over. The job of designing how prompts compose into systems is just beginning.
Bedrock has gotten meaningfully better in the last six months. The places it hasn't are still the same places. Worth being explicit about which gaps are likely to close and which look structural.
Three open-source vector stores cover most of the self-hosted RAG surface area in 2025. Worth being concrete about which one fits which workload, because the trade-offs matter and the docs won't tell you.
Six months ago MCP was Anthropic's protocol nobody had implemented. Now it's a category every major vendor ships against. The thing nobody is asking is what that does to the protocol itself.
DeepSeek dropped a V3 update with sharper benchmarks and a price floor that makes the rest of the workhorse-tier market look expensive. The cost per million is interesting; the trajectory is the actual story.
GPT-4.1 quietly shipped this week and it's the most interesting OpenAI release of the year. Not because it's the most capable model, because it's the model the business actually runs on.
The legal frameworks around AI training data are still arguing about the wrong layer. The interesting ownership question isn't whether your text was scraped. It's whether the captured shape of how you reason is yours.
Llama 4 is the first time Meta has shipped a mixture-of-experts open-weights model at frontier scale. The release is more interesting for what it implies about Meta's strategy than for its benchmark wins.
Bedrock is built for the enterprise integration story. Most of the AWS-doc treatment assumes that audience. Worth working through what the small-shop or solo-developer version of using it actually looks like.
The framing has become a cliché. The actual mechanics, onboarding, scope of work, performance review, termination, change in interesting ways once you take the metaphor literally.