GPT-4 first impressions: and a thought about who owns what they teach it
GPT-4 dropped a week ago. The capability jump is real. The ownership question it surfaces, about the corpus you use to teach it, is sharper than the marketing suggests.
GPT-4 dropped on March 14th, the same day Anthropic launched Claude. After a week of running both through real work, the GPT-4 capability jump is the bigger story, and not for the reasons most of the launch coverage focused on. The exam-score charts and the "passes the bar" headlines got the attention. The thing actually worth talking about is more boring and more consequential: it's competent enough at certain kinds of tasks that the question of what you teach it starts to matter in a new way.
Where the jump is real
A week of running GPT-4 against the same prompts I've been running through GPT-3.5 since November surfaces a pretty clear pattern. The jump is concentrated in three areas:
Multi-step reasoning over a long context. Take a non-trivial problem, debugging a system with three interacting services, planning a multi-week migration, walking through a legal document with structured exceptions. GPT-3.5 starts strong and loses the thread by step four or five. GPT-4 holds the thread to the end. The difference isn't subtle.
Following nested instructions. "Write me X, but make sure Y, and avoid Z, except in case W." GPT-3.5 ignores about a third of the constraints. GPT-4 honors most of them most of the time. For anything where the prompt is itself doing real work (content generation with structured requirements, code generation against a spec) that's a category change in usability.
Style imitation with a substantial sample. Paste in 3,000–5,000 words of someone's writing alongside a prompt to write something new in their voice, and the output is recognizably in the source's voice. Not perfectly. Not always. But it's no longer a parlor trick, it's something you'd consider using.
That last one is what I want to spend most of this post on, because it's the capability with the longest second-order tail. The interface shift from a few weeks ago gets the headlines; this is the capability that quietly resets a lot of the questions about authorship and ownership that the field has been able to ignore.
Where it's hype
The "passes the bar exam" framing is doing a lot of work in the press coverage. It's true and it's also misleading. GPT-4 passes the multiple-choice section of the bar at a high percentile because the multiple-choice section of the bar is (by design) a memorization test where most of the answer space lives in the training data. GPT-4 passes the AP Biology free-response section at a 5 because graders are lenient and the model writes confident-sounding prose. None of this means GPT-4 reasons about novel legal cases or novel biological problems the way a competent practitioner does.
The honest version of the capability story is: it's substantially better at the kinds of things GPT-3.5 was already pretty good at, and modestly better at the kinds of things GPT-3.5 was bad at. The gap closure is mostly on tasks that look like the training distribution. The hard reasoning problems that GPT-3.5 couldn't touch are still mostly out of reach. There's been progress, but it's not the qualitative leap the marketing suggests.
The "no longer a parlor trick" thing about style imitation, on the other hand. That's the part I think gets understated.
What style transfer actually buys
Try this experiment: take ten thousand words of a writer with a distinctive voice (a technical blogger, a columnist, a working professional who writes a lot) and paste it into GPT-4 along with a question that writer hasn't answered. Ask for an answer in their style.
The output will be rougher than the source. It will sometimes get facts wrong (the model doesn't know what the writer would actually say, only what someone writing in that voice might say). But the cadence is recognizable. The recurring phrasings show up. The way that person tends to open and close arguments shows up. The kinds of asides they make show up. If you handed the output to someone familiar with the source, they'd say "this is close but it's not quite right."
That's not nothing. That's the leading edge of something. Two more capability cycles and the output stops being identifiable as not-the-source for a lot of readers.
The question that follows from that capability is the one I want to think about: who owns the resulting voice?
The ownership question, sharper than before
Before this capability existed, the answer was straightforward. A writer's voice belonged to them in a vague, qualitative sense. You could imitate it as a tribute or a parody, fair use covered the obvious cases. You couldn't really clone it. The cost and skill required to imitate someone well were high enough that very few people did it.
Now the cost is dropping. Anyone with API access and ten thousand words of someone's published writing can produce passable imitations on demand. Two months ago, this was theoretical. Now it's a Thursday afternoon.
The legal frameworks aren't ready for this. Copyright covers the words a writer wrote. It does not cover how that writer writes. The trademark regime covers names and identifiable marks. It does not cover the cadence of a paragraph. The right of publicity (in the US, varies by state) covers likeness and identity in commercial contexts. It does not, as currently interpreted, cover written voice.
So the practical situation as of March 2023 is: if a writer has published enough material on the open internet, anyone can train or prompt a model that approximates their voice, and there's no clean legal mechanism for the writer to object, let alone profit. The model providers have terms of service that gesture at "don't impersonate real people," but those terms are unenforceable at scale and rarely enforced even at small scale.
I don't have a settled view on what should be done about this. I have a strong intuition that something should be done about it, because the current situation makes a writer's most distinctive professional asset (the way they think on the page) into a free input for anyone with API access. That feels structurally wrong even if I can't yet articulate the right framework.
The shape of the next problem
The thing that strikes me, looking at GPT-4 in week one of being available, is how quickly the capability pace is outrunning the legal and economic frameworks around it. Style imitation works now, in a way it didn't six months ago. There's no mechanism for the people whose styles are being imitated to participate in the value, license the use, or even be informed it's happening. The technology is racing past the institutions that would normally mediate this kind of question, and the open-weights ecosystem that landed last week with the LLaMA release is going to make the gap wider, not narrower, because anyone with a corpus and a weekend can now run their own version.
Maybe the right answer is some kind of opt-out mechanism baked into model training. Maybe it's a licensing system for voice as an artifact distinct from the underlying writing. Maybe it's a market structure I can't see yet because it doesn't exist anywhere I can point to. The thing I'm sure of is that "do nothing and let the capability outrun the framework" is the path of least resistance, and that path probably ends somewhere ugly.
The capability is real. The frameworks are not. Worth thinking hard about which gap closes first, and on whose terms.