DNA, lineage, and provenance: the genetic metaphor for AI artifacts
Manifests are snapshots; lineage is a graph. The genetic metaphor, descent, inheritance, mutation, gives an AI pipeline the vocabulary to track what every artifact inherited from where, and the design discipline to treat provenance as a first-class artifact instead of metadata.
The first time somebody asked me, in earnest, "where did this model come from?" I had a good answer for about thirty seconds. Checkpoint 4 of the second fine-tune of the third base release. Q2 corpus. The eval suite we'd been running since March. Then the follow-up, which version of Q2, which prompts shaped the SFT pass, which downstream embeddings inherited from this checkpoint, and I was stitching answers from four systems and a Slack thread. The confidence drained out of the conversation in the way that tells you the architecture has a hole in it.
The hole isn't unusual. Every AI pipeline I've worked on has had some version of it. The vocabulary almost everyone reaches for is the manifest, a JSON file alongside the artifact saying "this model was trained on this dataset, with this code, at this time." Manifests are useful. They are not enough. A manifest is a snapshot. What you actually need is a graph.
I keep reaching for the genetic metaphor here. Not because biology has anything technical to teach a pipeline engineer, but because biology already invented the right vocabulary for "what did this thing inherit from where, and what shares its lineage." Descent. Inheritance. Mutation. The genome is the snapshot; the family tree is the graph; the population is the system you're actually trying to reason about. The genome of an individual organism is useless without the lineage, you can sequence it perfectly and still not know whether a trait is novel, inherited, or convergent.
This is the fourth metaphor in the series, complementing atoms-and-molecules (composition), the periodic table (layout), and cosmology (containment). Where those three are about how things relate at a moment, this one is about how things relate across time.
What the manifest framing hides
A manifest tells you what an artifact is made of right now: dataset hash, code commit, hyperparameters, base model. A static parts list. For build artifacts where inputs are small and the lifecycle is short, that's enough.
AI artifacts don't behave like build artifacts. Three reasons.
First, the inputs aren't small. A training corpus is itself a derived artifact (scraped, filtered, deduplicated, labeled, augmented) with a lineage of its own that crosses pipelines and probably team boundaries. The "dataset" in the manifest is one node in a tree of datasets. The manifest captures only the leaf.
Second, the lifecycle is long. A checkpoint gets fine-tuned, distilled, quantized, served, and re-trained on logs that include its own outputs. Each operation produces a new artifact whose manifest references the prior one, but the chain isn't navigable from any single manifest. To answer "is this production model downstream of the corpus we now know was poisoned?" you traverse backwards through every link, and a manifest doesn't know it's a link.
Third, artifacts share foundations. The same base model spawns dozens of fine-tunes, hundreds of adapters, thousands of prompts. Manifests describe each in isolation; they don't describe the population. "Which prompts in production are affected by the change to this base model" is a question you'll eventually have to answer, and the manifest is silent because the question is across artifacts.
The graph is the thing. The manifest is a node in the graph. Treating the manifest as the unit of provenance is the analog of doing biology by sequencing one organism perfectly and ignoring the family tree.
What the genetic metaphor demands
The metaphor earns its keep by forcing four properties into the design.
Every artifact has parents. Not "inputs", parents. Inputs are what the build consumed; parents are what the artifact descends from. They overlap, but parents include things the build didn't directly consume yet the artifact still inherited from: the prompt template that shaped the SFT data three steps back, the eval set whose failures drove the curriculum, the base model whose tokenizer is now baked in. Every parent edge is typed, trained-on, distilled-from, quantized-from, prompted-by, evaluated-against. The edge type tells you what was inherited.
Lineage is queryable in both directions. Walk up to ancestors ("what shaped this?") or down to descendants ("what does this shape?"). Both queries are first-class. Most provenance systems get the upward query right and ignore the downward one because it's expensive and nobody asks until it's too late. But the downward query is the one you need when you discover a problem upstream and need to know the blast radius.
Mutations are explicit. When an artifact is derived from a parent with some change, the change is recorded as a typed mutation. Genetic mutations come in flavors (point, insertion, deletion, duplication, recombination); model mutations have analogous ones (continued training, parameter pruning, layer freezing, adapter merge, RLHF pass). "Same model with one extra epoch" is one kind of edge; "same model quantized to 4-bit" is another. Both produce a child, but they relate to the parent in different ways, and the edge type tells you which.
Lineage is a first-class artifact, not metadata. The load-bearing one. Most pipelines treat provenance as a sidecar, a JSON file next to the artifact, indexed weakly if at all. The genetic framing inverts that. The lineage graph is the artifact you most care about; individual nodes are how it's instantiated. The graph has its own schema, storage, access patterns, SLOs. You version it, audit it, query it. The artifacts are projections of nodes; the graph is the system.
When those four properties are present, you have a genealogy, not a parts list. The questions that used to take an afternoon and a Slack thread take a query.
What a lineage-aware AI pipeline looks like
Concretely. The pipeline has a provenance store as a primary subsystem, not an observability afterthought. Every meaningful creation event (corpus build, fine-tune launch, eval run, serving deployment, prompt commit) emits a node with typed edges to its parents. Schemas for nodes and edges are enforced at ingestion, the way a type system is enforced at compile time.
Pipeline tools (trainer, eval harness, deployment controller, prompt registry) all write into the same graph, not into separate manifest files reconciled later. The graph is the source of truth; artifacts carry a stable identifier pointing at their node. Most teams skip this because making N tools agree on a graph schema is more political than technical. It's worth eating the cost. The alternative is N parallel "lineage" stories that disagree at every reconciliation.
The graph is content-addressed where it can be. Hashable artifacts (datasets, model weights, prompt templates, eval suites) carry the hash in their identity. Two nodes with the same hash are the same node, because the identity rule says so. This is the atomic-molecular discipline applied at the node level. Atoms are immutable, typed, small, stably identified. Lineage edges compose them. The graph is the population.
Queries are part of the developer surface. "Everything downstream of this dataset version" is an API call, not a forensic exercise. "Every production prompt depending on a base model older than ninety days" is a dashboard, not an audit project. When queries are easy, the team starts asking them prophylactically instead of in postmortems.
The graph is also where policy hooks anchor. The decisions-as-code pattern that governs deployments governs lineage: "no production model may descend from a corpus that hasn't passed the PII filter at version >=3" is enforced against the graph at promotion time. The check walks lineage upward; if any ancestor fails, promotion is blocked. The graph makes the policy enforceable; the manifest framing makes the same policy a wish.
Treating provenance as a first-class artifact
The design discipline that follows is the part I want to underline. Same shape as the other metaphor pieces, the framing changes the work.
If provenance is metadata, it gets the budget of metadata: a few hundred bytes next to the artifact, an index nobody owns, a schema that drifts because no one's job depends on it. When something goes wrong upstream, you spend a week reconstructing what should have been a query. The cost is invisible until you need it, then catastrophic.
If provenance is a first-class artifact, it gets the budget of one. Schema review. SLOs. Versioning. Backups. An owning team. Tests that fail when the graph isn't ingested correctly. The cost is visible up front and cheaper than the alternative because the alternative compounds. That discipline is what separates a pipeline that answers the audit question in a meeting from one that schedules a sprint to find out.
The cultural piece is harder than the technical. Engineers like to ship the model and treat lineage as exhaust. Reframing it so lineage is the product and the model is one node, so the standard answer to "what changed?" is a graph diff, not a release note, takes deliberate work. The metaphor helps because it makes the framing self-justifying. Nobody seriously argues that the genome of one cell tells you what's wrong with the organism.
Where the metaphor has limits
Genetics gives you single-parent inheritance for asexual reproduction and dual-parent for sexual; AI artifacts can have N parents and the metaphor needs a stretch. Adapter merges, ensembles, RAG retrievals at inference, many parents whose contributions are weighted, sometimes opaquely. The fix is to keep the inheritance structure but admit weighted, multi-parent edges. Hybridization works as a mental model; it's just more common in pipelines than in nature.
The other limit: biology has natural selection telling you which lineages matter. Pipelines don't. You have to choose deliberately what's worth admitting, emit a node for every prompt evaluation and the graph melts; emit only for promoted artifacts and you lose debug resolution. The granularity choice is unavoidable and the metaphor doesn't make it easier. But once set, the rest carries through cleanly.
The discipline, not the helix
The metaphor isn't load-bearing on its own. The discipline is: treat lineage as a first-class object, not an annotation. Name parents. Type edges. Make mutations explicit. Make the graph queryable in both directions. Anchor policies, audits, and debugging sessions in the graph rather than per-artifact files.
Call it lineage, provenance, ancestry, genealogy, whichever word doesn't already mean something else in your codebase. The point is that you have a graph, the graph is owned, and the graph is the answer to the questions that matter. Manifests stay useful as the on-disk projection of a node. They stop carrying weight they were never built to carry.
I keep coming back to the genetic framing because the audit question (where did this come from) is structurally a lineage question, and biology has had the right vocabulary for a hundred and fifty years. Borrow it. Skip the nucleotides. Treat descent as a thing you design for, not reconstruct after the fact.
, Sid