Article 5: Track everything

Article five of my personal coding constitution: every action that mutates state must leave evidence. The trail isn't for the auditors. It's the thing that earns me the right to keep going.

Article 5: Track everything

The fifth article in my personal coding constitution is a rule about evidence. Every action that mutates state (mine, an agent's, a script's) must leave a trail. Not because somebody's going to audit it. Because without the trail. I can't act tomorrow.

Track everything.

Not "track the important stuff." Not "track the destructive operations." Everything. Reads, writes, retries, branches taken, branches not taken, the moment I changed my mind, the moment the agent changed its mind. If it mutated state (in a file, in a system, in my own model of what's true) there's a record of it.

This is article five and it's the one that took me the longest to write down, because for years I treated the tracking discipline as an operational concern. The how is operational. I've written about the how elsewhere, what gets logged, where it goes, how it's queried. This article is about the why, which is constitutional.

The rule isn't about audit-readiness

Let me get this out of the way first, because I had it backwards for a long time.

The honest version of why I started tracking everything was that I thought I was building for the auditors. SOC 2, ISO, the inevitable customer questionnaire. The trail was for the people who would one day ask me to prove I knew what my systems did. That's a real reason and it's not a wrong reason. It's just not the reason that earns the discipline its keep day to day.

The auditor shows up once a year. The day-to-day reason the trail matters is much more selfish, and much more constant: without the trail. I can't act tomorrow.

That's the whole rule, said at the level it actually operates: loss of the trail is loss of the right to act later. Not because somebody else takes the right away. Because I lose the ability to use it responsibly.

What the trail buys me, in concrete terms

If I made a change yesterday and don't know what I changed. I can't safely make a change today. That's the rule, and said baldly it sounds dramatic. Lived in. It's just the obvious result of three things that are always true.

The first is that I forget. Not occasionally. Constantly. The change I made on Tuesday is gone from my head by Thursday. The agent's change from this morning is gone from its context window by tonight. Memory is not the foundation the trail can live on.

The second is that downstream things break. Not always. Often enough that the failure mode has to be planned for. When the downstream thing breaks, the question that has to be answered immediately is "what changed recently in the upstream system?" If the answer requires reconstruction (pinging people, reading scrollback, guessing from the diff) the incident is twice as long and half as well-resolved.

The third is that the cost of an action is not the cost of doing it. The cost of an action is the cost of undoing it, or explaining it, or building on top of it. All three of those costs go to infinity when the trail isn't there.

So the trail isn't a tax I pay. The trail is the thing that lets me act fast tomorrow because I knew what I did yesterday. Without it, every action carries the weight of every previous unrecorded action, and after a few months of that weight, I either stop acting (paralysis) or act recklessly (because the cost is invisible). Neither is the mode I want to operate in.

The three things the trail must answer

When I'm trying to decide whether something counts as "tracked" for the purposes of this rule, I check it against three questions. If the trail can answer all three for any state mutation in the last ninety days, the discipline is doing its job.

What did I do? The literal action: the file modified, the API called, the record updated, the deploy that shipped. Not the intent, not the explanation. The thing that actually happened, captured at the level the system can replay if it has to.

Why did I do it? The intent that drove the action. This is where article one (no issue, no code) pays its rent. The action references the issue. The issue holds the intent. The trail of the action points at the trail of the why, and the two together form a record that survives both my forgetting and the agent's context window closing.

What can I undo, and what can't I? The destructiveness inventory. Reversible mutations and irreversible ones get treated differently, the trail has to be honest about which is which. A file edit is reversible by reverting the commit. A DROP TABLE is not. A deploy to a stateless service is reversible by rolling back. A migration that mutated user data is not. The trail that doesn't tell these apart is misleading in the worst way: it implies I can roll back when I can't.

If those three questions can be answered, the trail is real. If any of them can't, the trail is decoration.

Why the rule covers reads, not just writes

The most counterintuitive part of "track everything" is that it includes things that didn't change state. Reads. Searches. Queries that returned nothing. The agent's first attempt at understanding the file before it edited the file.

I include reads because the read is part of the model that produced the write. When I'm trying to figure out why an action happened, the question isn't only "what was the action?", it's "what did the actor know when they took the action?" The reads are the thing that produced the knowledge. Without them, the action is uninterpretable.

This matters operationally with agents in a way it doesn't quite with humans. When an agent edits a file, the prior reads are the agent's evidence base. If the agent read the wrong file, or read a stale file, or read only part of the file, the edit is going to be wrong in a way that's invisible from the diff. The diff looks reasonable. The reasoning behind the diff was broken. The only way to surface the broken reasoning is to have the reads in the trail alongside the writes.

The cost of logging reads is small. The information value is large. The rule pays for itself.

What the rule prevents

The rule prevents a specific failure mode I've lived through enough times to take seriously. The mode goes like this.

Something breaks. I (or the agent) start digging. The digging involves reading several systems, comparing them, forming a hypothesis. The hypothesis suggests a fix, the fix is applied, and it works. The incident is resolved. Two weeks later, a related thing breaks. I dig in again, and discover that the fix from two weeks ago was wrong in a subtle way. The fix had a side effect I didn't notice. To untangle the side effect, I need to know what the fix actually did, and what state the system was in when the fix was applied.

If the trail is there, this is a forty-minute job. Pull the logs, read the actions, and read the reads that informed the actions. Form a new model, apply a corrected fix, and move on.

If the trail isn't there, this is a multi-day job, and the multi-day version has a high probability of producing another wrong fix that creates the next incident. The hole gets deeper. The system becomes less knowable. Eventually I (or whoever inherits the system) stop trusting it, and the practical result is that nobody is willing to make changes anymore. The system calcifies because the trail is missing. That's the actual long-run cost of skipping the discipline. Not "the auditor will be sad." The platform stops being something humans are willing to touch.

Where the rule pairs with the others

Article one (no issue, no code) is the spine. The issue is where the intent lives. The trail is where the actions live. They're two halves of the same record, intent on one side, what actually happened on the other, both attached to the same artifact through references.

Article four (three strikes, then re-plan) leans on this rule directly. When the rule fires, the re-plan starts with "write down what you actually know." Writing down what you know requires that the trail of what you tried is intact. Without the trail, you don't know what you actually tried, you know what you remember trying, which is not the same thing and is the source of half the bad re-plans I've ever done.

Article five also makes the future articles possible. The articles I'm still working on, separation of intent from what gets shipped, scope discipline, the rule about destructive operations, all assume a trail exists. None of them are enforceable in retrospect on an untracked system. The trail is the precondition for almost every other discipline I want to apply.

The honest objection

The honest objection to "track everything" is that some of the trail is never read. I log a year of reads and writes and the vast majority of those records sit in a store nobody ever queries. Why bother capturing what nobody will look at?

Because I don't know in advance which records will be the ones I need. The query I'll run six months from now isn't a query I can predict today. The shape of the next incident isn't the shape of the last one. The trail's value is option value. I'm paying a small, predictable cost now in exchange for the optionality of being able to answer questions I haven't yet thought to ask.

Option value is hard to feel in the moment because most of the options expire unexercised. The few that I do exercise pay back the entire cost of the discipline, several times over, in a single use. I've stopped trying to predict which records will matter. The rule is the rule. Track everything.

What the rule is really about

The rule looks like it's about logging. It's actually about earning the right to keep going. Every action I take today increases the surface area of the system. The trail is what lets me keep that surface area knowable. Without the trail, the surface area grows and my understanding of it shrinks, and at some crossover point I lose the ability to act on the system safely. The rule prevents the crossover.

The other articles tell me how to act. This one tells me what I owe the system in exchange for the right to act on it. The trail is the price of the next action. Pay it now, every time, and the next action stays cheap. Skip it, and the next action gets a little more expensive, and the action after that more expensive still, and eventually the cost is high enough that nobody acts at all.

So: track everything. Reads, writes, retries, the moment the model changed, the moment the agent gave up. The trail is the spine of the right to act later. Lose it and I lose the system. Keep it and the system keeps me honest.

, Sid