Style is not knowledge: and why the difference matters for AI

Capturing how someone writes is not the same thing as capturing what they know. Both are interesting capabilities. The conflation between them is going to cause problems.

Sid Smith

03 Apr 2023 • 6 min read

There's a category mistake that's starting to show up in the early conversations about style imitation in language models, and it's worth pulling apart while the question is still tractable. The mistake is treating "the model can write in someone's voice" as if it's the same accomplishment as "the model knows what they know." It's not, and the gap between the two is the entire interesting problem.

The GPT-4 first-impressions piece from last week flagged that style imitation is now a real capability rather than a parlor trick. This piece is the follow-on: that capability is going to get conflated with a different one, and the conflation is going to cause specific kinds of trouble.

What the model is actually doing when it imitates a voice

Modern large language models are, at their core, very good pattern matchers over the statistical structure of language. When you paste in a writer's published work and ask for output in their style, the model is conditioning its next-token predictions on the cadence patterns, lexical choices, sentence-length distributions, and structural moves present in the sample. With enough sample data, the output is recognizable as belonging to the same broad voice family as the source.

What the model is not doing is reasoning about the world the way the source writer reasons about it. It's not retrieving the writer's knowledge of their domain. It's not modeling the writer's specific opinions on questions the writer has not addressed in writing. It's producing text that sounds like the writer, conditioned on the model's general-purpose understanding of whatever topic is being discussed.

The distinction shows up clearly when you push past the obvious. Ask the model to write a Hemingway-style description of a rainy afternoon and the output will be recognizably Hemingway-shaped. Ask the model to write Hemingway's view on a specific contemporary event Hemingway never addressed, and the output will be Hemingway-flavored prose presenting a position the model invented. The cadence is real. The position is hallucinated.

Same shape of problem applies to a domain expert. A model fine-tuned on a database administrator's blog will produce DBA-flavored prose. Whether that prose contains that DBA's actual operational judgment about a novel problem is a separate question, and the answer is mostly no.

Why this matters

A few practical consequences fall out of taking the distinction seriously.

For the person whose voice is being imitated. If a model can write in your voice but not from your knowledge, the imitation is closer to a stylized impersonation than a working substitute. That's still a real concern (impersonation is a real harm) but it's a different concern than "this model can replace the work of the original expert." Conflating the two leads to either complacency ("the model can't really do my job") or panic ("the model can really do my job"), and neither matches the actual situation.

For the person consuming the output. If a model produces output in someone's voice on a topic that person hasn't published on, the consumer is being shown a confident-sounding answer that has none of the underlying judgment. The reader's natural calibration ("this writer usually knows what they're talking about, so I'll trust this") leads them to over-weight an output that's actually just plausible-sounding prose with the original author's pattern signature. That's a category of misinformation that didn't really exist before the capability did.

For anyone trying to build a useful product on top of this. A product that captures someone's voice but not their knowledge is a stylized chatbot. A product that captures someone's knowledge but not their voice is a domain-specific question-answering system. A product that captures both is a much harder engineering problem and a much more valuable one. The two get marketed as the same thing right now. They're not.

The harder problem, capturing knowledge

Capturing voice is mostly solved at the demonstration level. You can sit down today, with reasonable hardware and an afternoon of work, and produce a model that imitates the cadence of any writer with a substantial public corpus. The technique stack is: a base model, a sample corpus of a few thousand words, either fine-tuning or in-context conditioning, and some prompt scaffolding. None of it is research-grade work.

Capturing knowledge (the operational judgment a domain expert applies to novel problems) is fundamentally harder, and it's the part where the current state of the art falls down.

Three reasons it's harder:

The corpus is incomplete. A writer's published work is a tiny fraction of their actual reasoning. Most of what an expert knows is in their head, in their unsent emails, in their conversations, in the decisions they made and didn't write down. The published corpus is the iceberg tip. Training a model on the tip teaches it the surface; the underlying judgment isn't there to be learned.

Reasoning isn't well-captured by next-token prediction. Even if the corpus were complete, the way a model represents knowledge is fundamentally pattern-completion over text. An expert's actual reasoning involves a working model of their domain, causal relationships, edge cases, "this looks like X but is actually Y" pattern recognition built on years of seeing problems. Some of that lives in language. A lot of it doesn't.

Novel situations require generalization the model can't do. The most valuable thing an expert does is apply their judgment to problems that don't quite match anything in their experience. That requires something like analogical reasoning, recognizing the deep structure of a new problem and mapping it to a familiar shape. Current models can do this in narrow ways and break down in broader ones.

What a usable knowledge-transfer system would need

If you wanted to actually capture an expert's knowledge (not their voice, but the operational judgment) what would the system need to look like? A few things, none of which exist as an integrated product:

A complete-ish corpus. Beyond published work, you'd need either deliberate elicitation (think: structured interviews) or instrumented capture (the expert's actual decisions, with context). The published corpus alone is insufficient.

Retrieval grounded in source material. The model shouldn't be making up answers; it should be finding the relevant precedent in the corpus and reasoning from it. Vector search plus a generation model on top is the obvious primitive, and it's where a lot of the early enterprise tooling is converging.

A framework for the model to acknowledge uncertainty. "This is exactly like a case I've seen. Here's the answer" needs to be distinguishable from "this is sort of like a case I've seen. Here's a guess." That's a calibration problem most current systems don't even try to solve.

An evaluation methodology. How do you know whether the system actually captured the expert's reasoning? You need held-out questions where you have the expert's real answer and can compare. That works in narrow domains and is hard in broad ones.

None of this is impossible. None of it is shipped. The gap between "model can write in someone's voice" and "model can apply someone's knowledge" is the entire interesting research and engineering frontier in this area for the next several years.

Why I keep harping on this

I think the conflation is going to cause practical problems, soon, in two directions.

Direction one: the writers and experts whose voices are being imitated are going to assume the imitation captures their knowledge, panic, and either over-react legally or under-react strategically. The actual threat is impersonation; the assumed threat is replacement; the responses to the two are different.

Direction two: people building products on top of language models are going to advertise "captures expert X's knowledge" while delivering "writes in expert X's voice on questions the model is making up answers to." Some of those products will be in domains where the difference matters a lot, medical, legal, financial, technical specialties where wrong answers presented confidently are a real harm.

Keeping the distinction sharp is mostly a discipline of being precise about which capability you're discussing. The model can do the first thing. The second thing is mostly an open problem. The two get marketed as one thing because the marketing is easier when they do, and the consequences of the conflation will be borne mostly by people who weren't in the room when the marketing was decided.

Style is not knowledge. Both are interesting. Conflating them is not.

What the model is actually doing when it imitates a voice

Why this matters

The harder problem, capturing knowledge

What a usable knowledge-transfer system would need

Why I keep harping on this

Subscribe to Echoes of the machine