Embeddings need personas too

A semantic search is too helpful by default. It will find the personal note that shares a name with the work doc. Scope the index.

Sid Smith

24 Apr 2026 • 7 min read

There's a thing semantic search does that nobody warns you about. It is too helpful.

Let me describe what bit me. A while back I asked my assistant to "find the notes about Daniel." I had a work doc with a Daniel in it, a vendor we'd been talking to. I also had a personal note from a year earlier with a different Daniel, a school friend's kid, with a sentence in it I'd written down at the time and then never thought about again. The semantic-search index didn't care which Daniel was which. Both notes had "Daniel" and both notes talked about the same kind of follow-up. The retrieval pulled both. The personal note dropped into a context where it had no business being.

Cross-persona retrieval is a policy decision, not a default.

Nothing technically went wrong. The math did what math does. Two documents about a person named Daniel are, by the dumb cosine measure of similarity, similar. The index returned them in order of similarity. The system did exactly what I asked.

The leak wasn't the data. The leak was the scope.

The thing the vector store doesn't know

A quick hand-extension before we go further: when an AI does semantic search, it doesn't grep for the word "Daniel" the way you'd hit Cmd-F. It turns every document into a list of numbers, an embedding (essentially turning the meaning of words into numbers a computer can compare). Then when you ask a question, it turns the question into numbers too, and finds the documents whose numbers are closest. That's how the AI "knows" that "follow up with Daniel" and "circle back with Dan" mean basically the same thing even though the words are different. It's the same technique called RAG (retrieval-augmented generation), if you want to look it up later. Powerful. Genuinely magic when it works.

The thing the vector store doesn't know (by itself, with no help) is which Daniel is which, or which world the question came from. It doesn't know that the question came from me-at-work versus me-on-a-Sunday. It doesn't know that one Daniel is bound up in NDAs and the other Daniel is somebody I'm friendly with at a barbecue. It just knows: similar numbers, similar numbers, here are your matches.

So if you build the way most teams build (one big shared embedding index, one big shared collection) you have built a system whose default behavior is to blur the worlds together. You have built a system that will helpfully find your personal note when you're asking a work question.

This is the moment people start adding metadata filters. "Just put a tag on it." "Just filter by source." That kind of patch is fine until it isn't. Tags get missed. New ingestion paths forget to set the tag. A new tool reads the index without applying the filter. The filter is a request the index honors when it remembers to. It's not a wall.

The persona is the wall.

What persona-scoped retrieval actually means

Here's how I think about it.

Every persona (Personal, Work, Family, the blogging one, the side-business one) gets its own index. Not one index with a column saying which persona this row belongs to. Its own. The Personal persona's retrieval reaches into the Personal index. The Work persona's retrieval reaches into the Work index. Asking the Work persona about Daniel doesn't pull from the Personal index because the Personal index is not in the set of indexes the Work persona is allowed to read from. The vector store doesn't have to be clever about it. It just doesn't have the option.

That's what I mean when I say cross-persona retrieval is a policy decision, not a side effect. You decide, in advance, that the Work persona is allowed to read from index A and B but not C. If you ever want a retrieval that spans worlds, say, your assistant persona genuinely needs to look at both work and personal to do its job for a specific window, that is a deliberate, scoped, audited decision, not the default behavior of a too-helpful search bar.

In practice the same shape repeats across all the audiences I write for, even though it shows up differently.

Personal. My Personal persona has its own little index. Notes app, journal, photos, the random stuff I jot down at midnight. My Work persona has a totally separate one. When I ask the Personal persona "what did I say about Daniel," it can't reach into the work index. The work Daniel is invisible to it. The journal Daniel is the only Daniel that exists from where the Personal persona sits. That's not a bug. That's the whole reason I drew the line.

Small business. If you run a side business (let's say you do consulting on the weekends) the business persona has its own client notes, its own meeting transcripts, its own playbooks. When that persona answers "what did we agree with the customer," it doesn't accidentally surface the time you used the same client's last name in a personal Google Doc three years ago. The client only exists in the business persona's index. Same person, same name, different worlds.

Enterprise. This is where it stops being a quality-of-life problem and starts being an audit problem. You have a customer-success team. You have a deal-desk team. You have an HR team. If they all share one vector index because "everybody's allowed to search all the docs," then your HR memos about a salesperson are one cosine-distance hop away from any retrieval that mentions that salesperson's name. You did not mean to wire it that way. You wired it that way by default. The persona-scoped pattern says: HR's retrievals reach into HR's index. Sales's retrievals reach into sales's. Cross-index reads exist, but they are written down as policy somewhere, and they show up in a log somewhere, and somebody had to sign off.

The motion is the same at all three scales. The size of the blast radius is what changes.

Why this isn't fixed by "just use a filter"

I want to be fair to the tag-and-filter approach for a paragraph, because it is the obvious thing to reach for, and it does work in toy demos.

The reason I don't trust it in production is that filters live in application code, and application code drifts. A new feature gets shipped. An engineer wires up a new retrieval path. They forget the filter on that path, or they pass the wrong tag, or they're running a script with admin credentials that skips the tag layer entirely. The vector store doesn't push back. It returns whatever's nearest. Six months later somebody runs a query and gets a result they shouldn't have, and you find out about it when somebody is upset.

When it happens, it's written down, scoped, and audited.

Compare that to the world where the Personal index physically isn't accessible from the Work persona's context. There's no filter to forget. There's no flag to skip. The retrieval path doesn't have the Personal index in its set of reachable stores, because the persona it's running under isn't allowed to reach there. It's not a check the code performs every time. It's the shape of the system.

This is the same argument I'd make about row-level scoping in the database (which I already wrote about, see Personas all the way down to the database) and about memory (which I wrote about the week before, see Memory isolation is the whole point). The point keeps being the point: scoping in shared spaces is something you do at the boundary, not something you remind yourself to do at the call site.

Want the deeper version of how the index actually gets keyed and federated under the hood? I went into the MCP side of this in MCP but persona-aware, same shape, applied to tools instead of vectors. The two pieces fit together: the tool layer knows which persona is asking, and the index layer knows which persona is allowed to be read from.

The "deliberate cross-persona read" case

There is one honest exception I want to call out, because it's the question I always get when I describe this.

You sometimes do want a retrieval that crosses worlds. The right example: an assistant persona that's authorized to act on your behalf for some defined purpose, and to do that job it needs to be able to read from both your Personal calendar and your Work calendar, because the whole task is "schedule a dentist appointment that doesn't conflict with anything."

That's fine. The pattern handles it. The point is that the assistant persona has been given read scope into both indexes, in writing, with a window of time, and the audit trail names which indexes it touched and when. The assistant doesn't get to reach across worlds because it felt like it. It got an explicit grant for an explicit purpose, and you'd see it in the log.

This is the same shape I'm going to chase next week, actually, when I write about autonomous agents, the agent doesn't get to drift outside the persona it's running as, even if it'd be technically convenient. See Autonomous agents inherit the persona when it's up.

What I'd ask if you were building this

If you're building this in your own setup (at any scale) here's the question I'd put first: when a retrieval happens, can you point at which index it ran against, and can you say why that index and not another? If the answer is "uh, we have one big index, and we filter," you have a thing that will work right up until somebody forgets to filter. If the answer is "the persona has a list of indexes it's allowed to read from, and the retrieval picks from that list," you have a thing that will keep working when you stop watching it.

The whole reason I keep writing about personas is that I want a system I can stop watching. Not stop auditing. I'm always going to audit. I mean: I want a system that, when I look away, doesn't quietly start mixing the worlds together because the math thought it was being helpful.

The Daniel in the journal stays in the journal. The Daniel in the vendor file stays in the vendor file. The vector store doesn't get to decide they're the same person just because their names match. The persona decides who's allowed to look at what. That's the whole pitch.

The thing the vector store doesn't know

What persona-scoped retrieval actually means

Why this isn't fixed by "just use a filter"

The "deliberate cross-persona read" case

What I'd ask if you were building this

Subscribe to Echoes of the machine