Personas, all the way down to the database

Persona isolation that stops at the application layer isn't isolation. The DB has to know the persona too, row-level filters, audit logs, the whole thing.

Sid Smith

22 Apr 2026 • 9 min read

I once spent a very long afternoon, years ago, watching what looked like a perfectly secure web app cheerfully serve another customer's data to the wrong customer. Every layer of the system did exactly what it was supposed to do. The auth check passed, the UI loaded the right page, the API endpoint did the lookup, and the DBA's nightly report showed nothing unusual. And somewhere in the middle, a WHERE customer_id = ? clause had been dropped during a refactor, and the query had quietly started returning rows from everyone.

That afternoon is, for me, the foundational lesson about isolation. Isolation that stops at one layer doesn't isolate. Isolation has to be true at every layer it touches, or it's not isolation. It's a story everyone tells themselves until the day a bug somewhere reveals that one of the layers wasn't carrying its weight.

The persona is a property of the row, enforced inside the DB.

I keep that scar handy now, twenty-odd years later, as I think about personas. We've spent the last two posts in this series talking about the layers above the data tier, the tool layer, the memory layer. Both have to be persona-aware. But if I stopped there, I'd be doing exactly the thing that bit me on that afternoon: trusting the upper layers to enforce a property that the lower layers don't know about. So this post is about taking the persona idea down one more level. All the way to the database. The DB has to know which persona is asking too.

Why the application layer can't carry this alone

The case for "we'll handle it in the app" is always the same and always wrong in the same way.

It goes like this: the API has the persona context. Every query the API issues filters by persona. The user's session is bound to a persona. As long as nobody bypasses the API, the data stays scoped. Done.

The problem is that the words "as long as nobody bypasses the API" are doing a lot of work in that sentence. Let me list a few of the ways that promise breaks, just from things I've personally witnessed:

A developer writes a one-off admin script to do a data migration. The script uses the same DB credentials but does not include the application's persona filter. A week later, the admin script becomes a recurring cron job. Two years later, nobody remembers why.
A new endpoint gets added by a different team. The original API has a middleware that injects the persona filter into every query. The new endpoint uses a different ORM helper that doesn't go through that middleware. The filter is silently absent.
An analyst connects a BI tool directly to the database for a quick report. The analyst is not malicious, they're answering a question. The BI tool has no concept of persona context, and the rows it sees aren't filtered by any.
An exception path in the application layer falls back to a "wide" query, SELECT * FROM events WHERE user_id = ?, when the persona filter can't be resolved. The fallback was added during an incident, never removed.

Every one of these scenarios is mundane. None of them require malice. All of them result in the persona boundary, which is real at the API layer, being completely meaningless at the database layer. The DB is the thing actually holding the data. If the DB doesn't know about the persona, the persona doesn't matter to the data.

The fix is the same shape as the rest of this series. The DB has to know.

What "the DB knows" actually looks like

Without going into a specific implementation, the shape is: every table where rows belong to a persona carries a persona identifier, and the data tier enforces (not asks, not relies on) that queries only see rows matching the current persona context.

In SQL-flavored databases, the modern way to do this is row-level security (yes, the term you're thinking of). The database is told: "for this table, every query returns only rows where the persona ID matches the session's current persona ID." The session carries the persona. The policy is enforced inside the DB, not above it. A developer who writes a SELECT * FROM family_calendar from the same session gets only their family's rows, not because the developer remembered to add a WHERE clause, but because the database itself refused to give them anything else.

You can do the equivalent thing in document stores, in vector indexes, in time-series databases. The names differ. The idea is the same: the persona is a property of the query, set by the session, enforced by the storage engine, present on every row, checked on every read.

There's a second piece, and it's just as important. The audit log at the data tier also has to record which persona was active. So when something goes sideways and you're reconstructing what happened, the row that says "this event was read at 9:42:13" also says "by the Work persona of user X." If you don't have that, you can stare at the access logs for an hour and still not know which side of the user's life a query came from. With it, the reconstruction is mechanical.

The third piece, which sometimes gets forgotten: writes carry the persona too. The row written by the Family persona is marked at write-time as belonging to the Family persona. Not stamped after the fact. Not inferred from the URL. Set at insert. That way the same row, read back later, can't be mistaken for belonging somewhere else.

Why the DBA's story has to match the user's story

There's a phrase that I've come to find clarifying: the DBA's story has to match the user's story. What I mean is this: if the user, sitting in their Personal persona, would say "this AI doesn't know about my work life," then the DBA, sitting in front of the database, has to be able to say the exact same thing about what queries from that session can return. The two stories aren't allowed to disagree.

When they agree, the system is honest. When they disagree, the system is lying, and you find out which one is true on the day a bug surfaces.

Practically, this means I want a DBA, looking at the DB, to be able to point at any row and answer two questions: Which persona does this row belong to? and Which personas can read this row? If those questions take more than ten seconds to answer, the data tier isn't carrying the property the rest of the system relies on.

It also means I want a developer, writing a new query, to feel a small bump of resistance when they try to write a query that would span personas. Not "you can't." But "you have to ask out loud." Cross-persona queries exist, for analytics, for some kinds of household-scope reporting, for legitimate enterprise reasons, but they should look different in the code, be named different in review, and leave a different kind of trace in the log. The same shape as cross-persona memory access from the last post: rare, deliberate, audited, never the default.

I made the case for that "rare, deliberate, audited" pattern in Memory isolation is the whole point. The DB layer is the same idea translated down one floor.

Three audiences, same plumbing

The shape of this scales cleanly, which is one of the things I find most satisfying about it. Let me thread the three audiences again.

For my personal setup, I want my AI's local DB (the thing storing my notes, my long-term memory items, my preferences) to have row-level scoping by persona built into it from the start. Not as a feature flag. Not as something the application layer adds on top. As a property of every relevant table. When I'm in the Personal persona, the rows the system can see are tagged Personal. When I'm in Family, the rows are tagged Family. If I ever decide to write a one-off script to back up "everything," that script has to make a deliberate, persona-spanning call, and the call should leave a trace I can find a year later. Even at home. Even just for me. The discipline doesn't get easier at scale; you build it small first.

If you run a small business, the data-tier story is the difference between "client data is logically separate" and "client data is actually separate." Logically separate is what most small-business stacks do today, a column called tenant_id or account_id, a hope that the application layer filters by it correctly, and a quiet prayer. Actually separate means the database refuses to hand over rows that don't match the active persona context, regardless of whether the application remembered to filter. Same primitive (persona_id on the row) but enforced by the DB rather than trusted to the code. The day you onboard your fifth client, your sixth, your tenth, you'll be grateful you set this up while it was just two clients and a spreadsheet of contacts.

Same primitive, every layer.

In an enterprise, this is the layer where SOC 2, GDPR, HIPAA, and every other framework you have to satisfy actually live. Auditors don't grade the screenshots of your application UI. They grade the controls at the system that holds the data. If your data-tier story is "we filter in the app," the question back is going to be "what stops a query from bypassing the app?", and if the answer is "nothing structural, we just don't do that," the conversation is going to get longer. Per-row persona scoping with database-level enforcement turns that conversation into a one-sentence answer. The control is at the data tier. It's tested. It's logged. Move on.

The parts that will bite you

I want to be honest about the cost. This is not free.

Migrations get more interesting. Adding a persona_id column to a table that didn't have one before, on a system that's been running for a year, is a real piece of work. Backfilling that column correctly for rows that predate the concept is harder still. The right move is to do this early, before the data volume hurts. The wrong move is to add it later and try to backfill from "context clues", the user IDs, the timestamps, the workspace tag. That kind of backfill is what produces the data set you don't trust.

Vector indexes have to play along. Embeddings need persona scope too (that's the subject of a later post in this series) but the database-level scoping has to be set up to support what the vector layer wants to do. If your retrieval pipeline pulls vectors and then joins to a metadata table, both sides have to honor the persona context. I'll get into that in more depth in a couple of weeks.

Cross-persona analytics is a real need that deserves a real pattern. Counting events across all of a user's personas, computing a household-scope view, rolling up cross-team metrics in an enterprise, all of these are legitimate. The wrong answer is "we just bypass row-level security for analytics." The right answer is a defined, audited, separately-credentialed path that does cross-persona work as cross-persona work. Different code path, different log line, different access controls. Same data, named correctly for what it is.

Performance is real but mostly fine. Adding a persona_id filter to every query sounds expensive. In practice, with a proper index, it's almost always free. The query planner is good at this. Don't let "perf concerns" be the reason you skip the discipline. Measure first; ninety-five times out of a hundred, the index does its job and nothing else needs to change.

Same primitive, every layer

The thread through this whole series is that the persona is the same idea at every layer it touches. Same name. Same identifier. Same scoping rule. Same audit story. The tool layer asks "which persona?" before it exposes a tool. The memory layer asks "which persona?" before it returns a retrieval. The database asks "which persona?" before it returns a row. None of these are separate inventions. They're the same primitive, restated in the local language of each layer.

That's the whole point. If your persona isolation stops at the application layer, a bug at the SQL layer cancels everything above it. If it stops at the SQL layer, a bug at the vector layer does the same. The fix is to make the property true at every layer, so that no single layer can betray the whole thing alone.

If you're running this in any real shape right now (at home, in a small shop, in a real company) the question I'd ask first is: what does the data tier know about personas today? If the answer is "nothing, we filter in the app," that's the thing to fix. Not because the app is wrong. Because the app being the only thing holding the line is the failure mode I started this post with.

The next piece in this series goes after the index. Specifically, the vector index, the thing that does the semantic search behind so much of what these systems do. The same shape applies. Embeddings need personas too. But that's for next week.