Six months thinking about personal AI: what I keep coming back to

A retrospective on a thread of posts about personal AI ownership, what's clarified, what's still murky, and what I'd revise about the earlier framing now that more of the substrate exists.

Hero image for: Six months thinking about personal AI — what I keep coming back to

Six months ago the LLaMA weights leaked, GPT-4 launched, and a thread of posts started here exploring what happens when the foundation for "your own AI on your own hardware" reaches the wider world. The posts have been talking around the same core ideas, knowledge as a portable artifact, the distinction between style and knowledge, the supply side and the artifact side of a personal-AI economy. With Llama 2 out, with the open-weights ecosystem stabilized, with the integration patterns clarifying, this is a useful checkpoint to revisit what's held up, what hasn't, and where the framing was wrong.

Some of the earlier posts read as more confident than they should have. Some of them turned out to under-appreciate constraints that are now obvious. Going back through them feels less like a victory lap and more like marking the spots where the foundation moved fast enough to make some of the earlier hedging unnecessary, and the spots where the gaps I named are still wide open.

What's clarified

Several of the questions that were genuinely open in March or April have at least partial answers now.

Local inference is real. In March, "running a competent language model on your own hardware" was a leaked-weights, weekend-tinker proposition. With Llama 2, it's an official, commercially-licensed, well-tooled situation. The capability ceiling is one tier behind the frontier (which is fine for a lot of use cases) and the trajectory is closing the gap, not widening it. The question "can I run a useful model locally" is settled. Yes, you can. The follow-on questions are about what to do with that capability.

Style transfer works at the demo level. The GPT-4 first-impressions post was tentative about whether style imitation was a parlor trick or a real capability. Six months in, the answer is: it's a real capability, with caveats. You can get recognizable style imitation from any of the major models with a sufficient corpus. The output is not indistinguishable from the source (anyone who knows the source well can usually tell) but it's good enough for many use cases. The legal and ethical framework around it is still completely absent, which was the actual point of that post.

Adapters are the right abstraction. The encoding-a-person post argued that LoRA-style adapters were probably the right path for personal-AI customization, because they're cheap, composable, and portable. Six months in, the open-weights community has converged hard on this. Most of the interesting Llama-derived models in circulation are LoRA fine-tunes or one of the variants (QLoRA, especially, given the hardware-friendliness). The "you'll have many adapters, swappable on top of a base model" model has won.

Context windows are getting bigger faster than expected. The Claude 2 post celebrated the 100K context window. Anthropic has hinted at larger sizes. OpenAI has shipped 32K GPT-4 to broader audiences. The trajectory on context size is steeper than I'd have predicted in March. This is pulling some of the architectural questions in unexpected directions, see below.

What I'd revise

Some things from the earlier posts I want to walk back, partially or fully.

The "personal AI artifact" framing was too narrow. The thought experiment in the Knowledge-as-a-Service post imagined a discrete file (an adapter, a personality, a "you in a box") that could be passed around, licensed, deployed. That framing assumes the personal AI is a standalone artifact. What's actually shaping up looks more like a layer, the user's data, voice, preferences sitting alongside whatever frontier model is in use, retrieved and conditioned on at inference time. The artifact-vs-layer distinction matters because the economic models are different (artifacts are licensed, layers are subscribed to), the privacy properties are different (artifacts can be local, layers tend to be cloud), and the technical stack is different.

I don't think the artifact framing was wrong, exactly. I think it was one of two viable shapes, and the layer shape is currently winning the architecture race. Worth saying so plainly.

The marketplace post over-indexed on existing analogs. The training-data marketplace post leaned heavily on the music industry analog (ASCAP, BMI, the collecting-society model). Six months in, I think that analog is misleading. Music licensing works in a world where the product is identifiable, repeatable, and discrete (a song plays, an event happens, a payment is owed). Training is amorphous, accumulative, and statistical. A song you wrote either was or wasn't played at a venue. A document you wrote either was or wasn't somewhat in the training set, with some influence on some outputs, in a way that's hard to attribute even with perfect logging. The accounting model has to be different. I don't know what it should look like, but ASCAP-for-text isn't quite right.

The "style is not knowledge" distinction was right and incomplete. That post argued style and knowledge are different things and that the conflation of the two would cause problems. Both halves of that have held up. What I missed at the time is that there's a third category, the operational habits of a person, distinct from both their voice and their declared knowledge. How they sequence work, what they do first when faced with a hard problem, how they decide what's important. Some of that lives in writing; a lot of it doesn't. Encoding that is harder than either style or knowledge, and might be the actual frontier of what makes a personal-AI system valuable. I don't have a clean framing for it yet.

What's still open

Most of the gaps I named six months ago are still gaps. The foundation moved; the institutional layer hasn't.

There's still no licensing framework for adapters or training contributions. A writer with a substantial corpus has no clean way to opt in or out of being trained on, no mechanism to license their voice or knowledge for use in personal-AI artifacts, no path to be paid for either. The technical infrastructure to track and pay for training contributions doesn't exist. The legal framework to define what's being licensed doesn't exist. Both gaps are unchanged from March.

Evaluation is still unsolved. The "does this system actually capture how the person reasons" question remains the bottleneck for everything else in the encoding-a-person stack. Some research has moved on it. Nothing production-ready exists. Until evaluation is solved, every claim about capturing reasoning is essentially marketing.

The agent operational foundation is still missing. The post on agents from June argued that the operational layer (auth, audit, blast radius, trust calibration) would lag the capability layer and we'd see ugly failures because of it. Three months later, the capability has continued to improve, the operational layer has improved less, and the first cluster of "AI did something embarrassing on my behalf" stories is starting to land. Pattern is on track.

The supply-side conversation has barely started. The marketplace post asked who's organizing on behalf of writers and contributors. Answer is still: nobody, structurally. The Authors Guild has made noises. The Writers Guild strike has gestured at AI rights. Some publishers (NYT, in particular) are quietly making moves. None of this is a coordinated supply-side response yet. It might take another year or two.

What I keep coming back to

After six months of writing in this thread, the through-line that keeps surfacing is this: the technical layer of personal AI is moving fast enough that the real work is going to be in the institutions, the licensing, the trust frameworks, the economic mechanics. The model can be encoded. The voice can be transferred. The knowledge can be retrieved. The question of who owns what, gets paid for what, and trusts what is the question that will determine whether any of this lands in a useful, equitable form, or whether it lands in the same shape every other digital-rights question has landed in, which is "the platforms keep most of the value and the contributors get the equivalent of a thank-you note."

I don't know how to fix that. I'm not even sure who's supposed to be working on it. The model providers don't have an incentive. The writers and contributors don't have organized representation. The regulators don't have the technical depth. The places where these conversations would normally happen (standards bodies, academic working groups, civil society organizations) are just starting to engage.

The next six months are going to be more about institutions than about models. That's a less glamorous frontier than the technical one, but it's where the consequential decisions actually are. The fact that we're asking the institutional question at all is partly because the technical layer settled down enough to leave room. That's progress. The thread will probably keep tugging on these institutional questions through the rest of the year and into next.

For the next post, I want to write down what kind of AI tool I actually want, the gap between the trajectory of what's getting built and the thing that would actually be useful. That's a different kind of thinking-out-loud. Closer to a wishlist than a forecast. Probably more productive than another round of "here's what's missing."

Six months in, the foundation is real. The framework is mostly absent. Worth a checkpoint.