Cloud

When HIPAA shapes the system before you write a line of code

Most teams treat HIPAA as a feature you bolt on at the end. The teams who've actually shipped under it know it shapes the data model, the audit log, the auth model, and the deletion semantics, before you draw a single arrow on the diagram.

Sid Smith

09 Dec 2025 • 8 min read

If you're building inside a regulated environment, you're not a team of one anymore. The regulator is on the team. The auditor is on the team. The compliance officer is on the team. They don't sit in your standup, but they hold a veto on every architectural decision you make, and they exercise it months after the decision was made, usually by asking a question your design has no good answer to.

Most teams I've watched try to retrofit compliance discover this the same way. The system gets built. It works. Somebody writes a HIPAA-readiness checklist. The team starts walking through it. Halfway down the list they realize the checklist isn't a checklist. It's a catalog of architectural decisions that should have been made at the beginning, several of which are now expensive or impossible to retrofit.

The teams who've actually shipped systems under HIPAA, or SOX, or GDPR, or PCI, or any of the regulatory regimes with non-trivial data-handling implications, know that compliance shapes the system before you draw a single arrow on the diagram. This piece is about what changes when "compliance-applicable" is a line in your requirements vs. when it isn't.

I'm using HIPAA as the running example because it's the regime I've worked with most directly, and because its shape is representative. The mechanics shift across regulations; the structural argument doesn't.

The four things that change before you write code

When a system is HIPAA-applicable, four parts of the architecture get re-shaped before any code is written. Not "after the MVP." Not "at the security review." Before. These are the parts where retrofit costs an order of magnitude more than building right the first time, and where the auditor's question lands hardest.

The data model

In a non-regulated system, the data model is shaped by the business logic. You store what you need to serve the use case. The schema's job is to make the application correct and fast.

In a HIPAA-applicable system, the data model is also shaped by classification. Every field has to know whether it contains PHI (protected health information), and every join has to know whether it's combining PHI with non-PHI in a way that creates new PHI by combination. The schema's job grows: still serving the application, but also serving the auditor's question "show me every field in your system that contains PHI, and prove that the access controls match the classification."

What changes architecturally:

Classification metadata is first-class. Not a code comment. A schema-level annotation, a column tag, a discoverable property of every table and every field. The annotation feeds the audit; the audit cannot be done by hand at any meaningful scale.
PHI segregation is a design decision. PHI in a separate database, separate keyspace, or at minimum a separate set of tables behind a different access path. The reason isn't paranoia, it's so that the access control story for the PHI subset can be different from the rest, and so that an audit can scope its evidence to the subset cleanly.
De-identification is a path, not a column. When you serve analytics or feed an AI model, you can't just SELECT FROM the PHI tables and call it analytics. The pipeline has to pass through a de-identification step that's reviewable, testable, and itself audited. That step is part of the data model decision, not a downstream addition.

A non-regulated team draws the data model in an afternoon. A regulated team spends days on it, because the cost of getting the classification wrong is the cost of redoing every downstream consumer when the auditor finds the misclassification.

The audit log

In a non-regulated system, an audit log is a nice-to-have. You add structured logging because you'll want it for debugging, and the security team appreciates it.

In a HIPAA-applicable system, the audit log is mandatory, immutable, retained for years, scoped to specific event classes (every access to PHI, by whom, when, for what purpose), and queryable by the auditor on demand. It's not a logging concern. It's a system requirement on par with "the application has to serve requests."

What changes architecturally:

Audit events are part of the API contract. Every PHI-touching operation emits an audit event with a defined schema. The schema is locked. You can't add fields to it casually because downstream queries depend on the shape. Audit-emission is not optional, and it's not best-effort, a failed audit emit should block the operation it was supposed to log.
The log is in a separate trust boundary. The application can write to the audit log; it can't modify or delete entries. The auditor reads from the log; the engineer doesn't. The ops team has its own access path with its own logging. The whole layout assumes that anyone who can edit the log can fake the audit, and routes around that.
Retention is in the foundation. You can't depend on a log retention policy that lives in the application code. The retention is in the storage system itself, write-once buckets, immutable databases, append-only structures with hardware support.

The retrofit cost of the audit log is brutal because every code path that touches PHI has to be revisited to add audit emissions, and the schema for those emissions has to be designed in advance. Teams who ship without it spend more time backfilling than they would have spent building right.

The auth model

In a non-regulated system, auth is "do you have a session token, and what role are you in." Coarse-grained, often.

In a HIPAA-applicable system, auth is much closer to "do you have a session token, what role are you in, what attributes do you have, what is the resource you're trying to access, what is the relationship between you and that resource, and what is the purpose of the access." The auth check is a function of (subject, action, resource, context, purpose), not a function of (user, role).

What changes architecturally:

The auth model is attribute-based, not role-based. Roles aren't expressive enough. You need attributes, "this user is a clinician at facility X, with a treating-relationship to patient Y, accessing for purpose Z." The attribute set is the input to the policy.
The policy is data, not code. The policy is a separately-versioned, separately-reviewable artifact. OPA, Cedar, ALFA, whatever, but the policy is in a place where the compliance officer can read it, the auditor can verify it, and the developer can change it through a reviewed PR rather than by editing application code.
Purpose-of-use is captured. The auth call carries the purpose. "Treatment," "payment," "operations," "research", the regulation cares which one applied, and the audit log records it.

The retrofit story here is the worst of the four. Migrating an application from role-based auth to attribute-based-with-purpose is a months-long project that touches every protected endpoint. Building it that way from day one is a decision you make on the diagram and it costs you a few extra hours, then.

The deletion semantics

In a non-regulated system, deletion is "the row goes away." Simple.

In a regulated system, particularly under GDPR for personal data, but also under HIPAA's right-of-amendment provisions and the various state health-privacy laws, deletion is an operation with constraints. Some data must be deleted on request. Some data must be retained for years even if the subject asks otherwise (regulatory retention requirements override the deletion request). Some deletions cascade in ways the application has to handle correctly. Backup-included deletion is a thing the regulator may ask about.

What changes architecturally:

Deletion is a workflow, not an operation. It's a multi-step process, the request is recorded, the eligibility is evaluated against retention requirements, the deletion is staged, the cascade is computed, the verification runs, the audit log of the deletion (yes, you log the deletion) is emitted.
Backups are in scope. The deletion has to extend to backups, or you have to be able to demonstrate why the backups are out of scope. Either path is a design decision, not a runtime decision.
De-identification is sometimes the right answer. Some data can't be deleted (regulatory retention) but can be de-identified, the personal identifiers come out, the underlying records stay. The data model has to support this without losing referential integrity.

Pair this rule with Article 3. Double check, never delete. The article-three discipline is the shape of the deletion gate; the regulation tells you which deletions are even allowed. Both are operating on the same caution toward destructive operations, from different directions.

What "compliance-applicable" means as a line in your requirements

The phrase I keep using is "compliance-applicable as a requirement line." It means that during the requirements phase (before architecture, before design, before code) somebody on the team writes down which regulatory regimes apply to this system. Not as a footnote. As a load-bearing line that the architect is expected to read first.

When that line is present, the four areas above get treated as load-bearing constraints. The data model conversation includes classification. The audit log conversation includes immutability and retention. The auth conversation includes attributes and purpose. The deletion conversation includes regulatory retention and the backup question.

When the line is absent, those conversations don't happen, and the team discovers them three months later when the security team or legal asks the question that the design has no answer to. The cost differential is large. I've watched teams spend a quarter retrofitting an audit log that a compliance-aware design would have shipped on day one with a fraction of the effort.

The DaC angle

The compliance regime is a source of business decisions that need to be projected everywhere. "This data is PHI." "This system is in scope for SOC 2." "These users are subject to GDPR's right of access." Those are decisions, in the Decisions as Code sense, they get made once, by the right people, and they need to project onto the data model, the auth layer, the audit log, the deletion workflow, the backup configuration, and the operational tooling.

The teams that handle this well treat the compliance status of every dataset, every user, and every operation as part of the standards layer. The classification of a table isn't a column comment. It's a property in the standards repo, projected onto the schema, the access policy, the audit emitter, and the deletion handler. Change the classification once, every consumer picks it up. The approach that works for "what's the standard sizing for a production database" works just as well for "what's the data classification for the patient-records table."

The mindset shift

The shift the teams I've watched succeed under regulation make is from thinking of compliance as a check (something you pass at the end) to thinking of it as a constraint (something that's an input to every design decision). The check mindset produces systems that pass the audit and feel like a fight every time they change. The constraint mindset produces systems where the audit is mostly the report on what the system already does, because the system was built to satisfy the constraint from the first arrow on the diagram.

If you're starting a system that's compliance-applicable, the cheapest thing you can do is name the regime in the requirements doc, identify the four areas above (data model, audit log, auth, deletion) as constrained, and have the architecture conversation with the constraints in the room from minute one. The hours that costs at the start save quarters later. The teams who ship under regulation know this. The teams who don't, learn it the hard way, usually by the time the auditor's first question lands.

You're not a team of one anymore. Build like the regulator is on the team, because they are.

, Sid

The four things that change before you write code

The data model

The audit log

The auth model

The deletion semantics

What "compliance-applicable" means as a line in your requirements

The DaC angle

The mindset shift

Subscribe to Echoes of the machine