Hallucinated files: a debugging chronicle

Two hours of chasing a bug that wasn't where the agent said it was, in a file the agent confidently described and that didn't exist. A close reading of one of the more useful failures of the year.

A close-up of an old wooden filing cabinet drawer slightly open with empty hanging folders visible inside

A short debugging chronicle, because the failure mode it taught me is one of those quiet ones I want to remember plainly. Tuesday evening, working in the IDE-agent surface, doing a routine refactor on a project I know well. The agent confidently referred to a file path. I trusted the reference. Two hours later I'd finished the refactor, confused about why one of the tests was failing in a way that didn't match my mental model, and the answer was: the file the agent had been talking about for the last two hours did not exist.

Worth pulling apart how that happened, because the pattern isn't unusual and the workflow change that prevents it is small.

The confident-wrong loop Ask fix the bug in user.py Agent reads imports user.py, finds nothing Hallucinates path infers './src/user.py' confidently Writes patch to a file that doesn't exist Reports success 'Done. Patched user.py.' verification step would catch it here ↑ The bug isn't 'agent made stuff up.' The bug is 'we shipped what came back without checking.'
The bug isn't agents making things up. The bug is shipping what came back without checking.

The chronicle

Setup: refactoring a small Python module that handles a particular kind of data validation. The module's been stable for months. I'd asked the agent to consolidate a few helper functions into a cleaner shape. Plan-mode review went fine. The plan referenced the helper file by path. I approved.

The agent worked through the changes. As it worked, it kept referring to a validators/legacy_compat.py file. It described what was in it. It described the relationships between functions in it. It made changes that depended on it being structured the way it was describing. Each individual statement was internally consistent.

At one point I noticed the agent was about to "import from validators.legacy_compat" in a new file it was creating. I told it to skip the import, the new code didn't need it. The agent agreed and skipped the import.

A test failed. The failure pattern said something was being called with the wrong arguments. The agent dug into the failing code, traced the call chain, said the issue was in validators/legacy_compat.py line 47, and proposed a fix.

I went to look at line 47 to confirm.

There was no validators/legacy_compat.py. There was a validators/compat.py (similar name, different content). The "legacy_compat" file the agent had been confidently describing for the last two hours had never existed in the repo.

What actually happened

A few things compounded.

The agent had pattern-matched from similar codebases it had seen during training, where a legacy_compat.py is a common shape for handling backward compatibility. When it scanned the repo and found validators/compat.py, it inferred (wrongly) that the standard name was legacy_compat.py and went on discussing the file as if it had that name.

The actual content of validators/compat.py was loaded into the agent's context at one point. The discussion of "legacy_compat.py" was the agent generating plausible-sounding text about a file that didn't exist, drawing partly on the actual content of compat.py (which it had seen) and partly on patterns from elsewhere.

I never asked the agent to confirm the file existed. The agent never volunteered that it was unsure. The conversation flowed forward on the unstated assumption that we were both talking about the same thing. We weren't.

The test that failed was failing for an unrelated real reason (a type-coercion edge case I'd accidentally introduced earlier in the refactor). The agent's diagnosis pointed at the hallucinated file, which led me to chase a fix in a file that didn't exist, which spiraled.

The two-hour chase

Once I realized the file wasn't real, the actual debugging took fifteen minutes. The two hours before that were:

  • Reading the agent's analysis of the hallucinated file's contents and trying to map it to what I thought was in the repo.
  • Asking the agent to "show me the relevant lines" and getting back plausible-but-fake content (the agent generated content that read like it could be in such a file).
  • Editing what I thought were imports based on the hallucinated structure.
  • Running the tests, watching them fail, asking the agent why, getting more analysis based on the hallucinated file.

Each individual step felt productive. Add them up and it was two hours of work in a parallel reality.

What was different about this failure

The runaway-tool-call piece catalogs the common failure modes, over-eager scope, plausible-but-wrong refactor, convincing-explanation-of-broken-code. This was a different shape. It was hallucinated existence rather than hallucinated content. The agent didn't get a fact wrong; it got the existence of an entire file wrong, and built consistent-feeling reasoning on top of the wrong existence.

The reason this is worth remembering: hallucinated existence is harder to catch than hallucinated content. When the model makes up a function name, you find out the moment you try to use it. When the model makes up an entire file and discusses it consistently for hours, you only find out when you happen to look at the disk. The discrepancy can run a long time before the reality check.

The workflow change

Two specific changes since this.

File existence verification at the start of any session. When a session involves talking about specific files, the first thing I have the agent do is ls the directory and report what's actually there. The agent reads its own ls output, anchors its mental model to what's actually on disk, and the hallucinated-file class of error becomes much harder.

"Show me the literal content" rather than "show me the relevant lines." When I want to see what's in a file, I ask for the literal content via a tool call rather than the agent's recall or paraphrase. The agent's paraphrase can be of a hallucinated file; the literal content is forced to come from disk.

These are small. They cost a few extra seconds at the start of a session and a few extra tokens per query. They prevent the failure mode that took two hours.

The broader lesson

Trust calibration in IDE-agent work isn't just about whether the agent's plans are right. It's about whether the agent's references are real. The plans get reviewed in plan mode; the references usually don't. The agent referring to "validators/legacy_compat.py" passed by me a dozen times without my checking, because it sounded right and I was busy.

The deeper version of the runaway-tool-call lesson is that the most expensive failure modes are the ones that look like normal work. The dramatic ones (the deleted-production variety) at least announce themselves. The hallucinated-existence ones don't. They burn hours quietly until something breaks the spell.

The discipline that prevents this is small and the cost of skipping it is large. Worth writing down because I'll forget the lesson otherwise. The two hours from Tuesday were a tax on not doing the thing I now know to do.