Automation

Context drift is the new tech debt

Tech debt was the unfunded liability of an engineering team for two decades. Context drift, the slow erosion of what the AI tools think you're doing, is the equivalent for AI-augmented teams. Same shape, different substrate.

Sid Smith

20 Oct 2025 • 6 min read

Tech debt was the framing that organized engineering team conversations about quality for two decades. The reason it worked (beyond the loan metaphor being intuitive) is that tech debt has a few properties that make it actionable: it accumulates if you don't tend to it, it compounds in ways that aren't always visible, and the cost of paying it down is meaningfully lower if you do it routinely than if you defer it for years.

There's a new thing accumulating in AI-augmented engineering teams that has the same shape, the same dynamics, and the same need for routine attention. It doesn't have a settled name yet. The closest fit is context drift, the slow erosion of what the AI tools think you're doing, what they have access to, and what's still relevant to your current work. Worth being explicit about it because the teams that don't notice it accumulating end up paying the same kind of disproportionate cost that teams paid for ignored tech debt.

What context drift actually is

Context drift accumulates across multiple layers of an AI-augmented engineering setup:

Stale system prompts. The system prompt for your IDE-agent was written six months ago against a codebase that's now meaningfully different. The agent's behavior is calibrated to a version of the codebase that no longer exists. Suggestions miss the mark in subtle ways.

Outdated indexed corpus. The retrieval index against your codebase / docs / notes was built three months ago. Half the files have moved or changed; new files have been added that aren't indexed; the index returns answers grounded in a stale view of the world.

Accumulated conversation context. A long-running conversation with the assistant has accumulated decisions, preferences, and constraints that no longer reflect current intent. The assistant continues to apply the old constraints to new work.

Drifted tool surfaces. The MCP servers the agent connects to have been updated; the args have changed; the agent is calling them with the old shape and getting subtly wrong results.

Outdated patterns in the prompt library. The team's prompt library (the carefully tuned prompts for specific workflows) was written against the model that was current six months ago. New model is meaningfully different in personality; old prompts don't quite work the way they used to.

Stale evaluation suites. The evaluation suite that confirms the AI is doing what it should was built against the old prompt library against the old model. It still passes; what it's measuring has drifted from what you actually care about.

Each of these is small. Together, they accumulate into a setup that gradually gets less useful while looking the same on the surface.

Why this is structurally tech-debt-shaped

The properties that made tech debt the right framing apply here too:

Accumulates without active attention. Nobody schedules "drift the context." It happens because the world changes around the AI setup faster than the AI setup updates itself. The default trajectory is more drift, not less.

Compounds in ways that aren't visible. A small drift in one layer is fine. Drifts in multiple layers compound, the stale system prompt asks for output the indexed corpus can no longer provide, the conversation context references decisions the prompt library doesn't reflect, the evaluation suite doesn't catch any of it.

The cost of paying it down compounds with the deferral. Catching up on six months of drift is much harder than handling it on a monthly cadence. The "we'll do a big AI-stack refresh next quarter" mindset is the same mindset that produced the "we'll fix the tech debt next quarter" failure mode.

The cost is paid by everyone, not by the deferrer. The team uses the AI tools and gets degraded results; the user who notices the degradation often isn't the one who could have prevented the drift. Same political dynamic as classical tech debt.

The metric that exposes it is qualitative. "The AI used to be more useful" is the leading signal. By the time it's quantitative ("our acceptance rate dropped 15% over six months"), the drift has accumulated significantly.

The framing is precise enough to be useful. Drift-as-debt is the right mental model.

What paying down context drift looks like

The discipline that prevents drift accumulation is similar in shape to good tech-debt management:

Routine review cadence. Quarterly review of the AI stack's configuration: what system prompts are deployed, what indexes are current, what tools are connected, what conversations have accumulated cruft. Not thorough; pattern-spotting.

Explicit refresh of the indexed corpus. Re-index on a schedule rather than waiting for someone to notice the index is stale. Monthly is reasonable for most teams. Continuous is better for teams that care.

System prompt versioning. Treat system prompts as code. Version them. Test them. When you update them, do it deliberately rather than by accident. The teams that keep this clean don't have the "the prompt has been edited 47 times by 12 people, nobody knows why each change was made" failure mode.

Conversation lifecycle management. The memory-hygiene discipline extends to drift management, periodic conversation resets, explicit memory commits, persona reaffirmation. The teams doing this on personal-AI workflows extend the same patterns to team-AI workflows.

Evaluation suite refresh. When the model changes, when the prompt library changes, when the underlying corpus changes, the evaluation suite needs to be re-validated. Not constantly; whenever any of the inputs change meaningfully.

Tool-surface change tracking. When an MCP server changes its argument shape, the agents calling it should know. Either the change is backward-compatible (and the agents work without modification) or it isn't (and the calling agents need updates). Tracking this requires the middleware layer to be aware of tool-surface versions.

Drift-debt budgeting. Allocate engineering time per quarter to paying down accumulated drift. Same mental model as the 10-20% time some teams allocate to general tech debt. Without explicit allocation, the time is never available.

These aren't exotic practices. They're the natural extension of normal engineering discipline to the AI-augmented case.

What teams that ignore drift look like

Patterns from teams that don't manage drift well, after a year of AI-augmented work:

The AI gets less useful, slowly. Acceptance rates decline. Suggestions get worse. The team's enthusiasm for the AI tools fades. Nobody can point at why.

The "we should switch models" reflex. When the AI feels worse, the instinct is to blame the model and switch. Sometimes that's the right answer; often the issue is drift in the surrounding configuration. Switching models without addressing drift makes the new model feel worse for the same reasons.

The evaluation suite becomes wallpaper. The eval suite still passes. Nothing on the dashboard suggests a problem. The team has stopped looking at it because it's not informative anymore, and they don't realize that's because the suite no longer measures what they care about.

The shadow-AI rebound. Engineers who used to use the team's AI surface drift back to using their personal ChatGPT / Claude account because the team surface is no longer better. The team's curated context, which was the whole point, gets bypassed.

The "AI was overhyped" conclusion. The team that used to be a positive case study for AI in engineering becomes a skeptic after the drift accumulates. The lesson they take is "AI doesn't really help" rather than "we let the configuration drift." Both are wrong; the first is what gets retold.

These patterns are predictable. They're also preventable with the discipline I sketched above.

The DORA-shaped corollary

The DORA-in-the-AI-era piece noted that the better-performing teams pair AI tools with operational discipline. Context-drift management is part of that discipline. The teams that build a routine for it are the teams that sustain the AI-augmented productivity gains; the teams that don't lose those gains over a year.

The companion metric for drift specifically: the time-since-refresh on each layer of the stack. System prompt last edited 4 months ago. Index last rebuilt 6 months ago. Eval suite last validated 8 months ago. Tool surfaces last reviewed never. The dashboard that shows these numbers is the dashboard that surfaces drift before it becomes painful.

The honest summary

Context drift is real, accumulating, and underaddressed. It's the same shape as tech debt, accumulates by default, compounds invisibly, costs more to fix the longer you wait. The teams that name it and address it routinely come out ahead; the teams that don't slowly lose the AI productivity gains they were tracking earlier.

The discipline isn't exotic. The naming matters because "context drift" gives teams a thing to talk about, plan against, allocate time toward. Without the name, the conversation is "the AI feels worse", which produces no actionable response.

Tech debt was a useful concept for two decades. Context drift is the equivalent for the AI-augmented era. Worth treating it with the same seriousness, the same routine attention, the same explicit allocation of engineering time. The teams that do compound their advantage; the teams that don't compound their disadvantage.

Same dynamic; different foundation.