Knowledge as a service: a thought experiment

If you could capture not just what someone knows but how they reason, and run that on local hardware, what would the artifact actually look like?

Hero image for: Knowledge as a service — a thought experiment

There's a structural weirdness in how knowledge work gets paid for. A consultant or contractor walks into a company, encodes some piece of operational expertise, walks out. The company has a thing they didn't have before. The consultant has a paycheck. The next company that needs the same thing pays the same consultant again from scratch (or pays someone else who develops the same expertise from scratch) or goes without.

The thing being transferred, call it operational expertise, or institutional knowledge, or "the way someone solves a vRA problem", is portable in principle but not in practice. Until very recently, the only viable container for that kind of knowledge was a human being.

Four months into the ChatGPT era, I think the question worth asking is whether the container just changed.

The foundation that arrived in the last four months

Three pieces showed up almost simultaneously in late 2022 and early 2023. Each is a precondition for treating expertise as a portable artifact rather than a person-shaped thing.

Conversational access to knowledge

Before ChatGPT, knowledge access was search, and search was a list of links you triaged. Conversational AI removes the triage step. You ask, you get the thing. The triage step is the category change.

Whether the thing you get is correct is a separate question. What matters for this thought experiment is that the expectation shifted overnight. Reading a document and synthesizing an answer used to be human work. It is now, for a wide range of tasks, model work.

Open weights running locally

The LLaMA weights leaked on March 3rd. Within two weeks, Stanford had a fine-tuned version called Alpaca running on a laptop. Whatever you think about the leak, the practical effect, as I walked through at the time, is that "running a competent language model on your own machine" moved from "research lab project" to "weekend tinker job."

This matters because the inference layer no longer has to live at OpenAI. If you wanted to license a piece of expertise to a small business, that small business doesn't have to send their queries (and their data) to a hyperscaler. The model can run on their hardware.

Models good enough to capture style

GPT-4 came out March 14th. Setting aside the marketing, what's actually new is that you can paste a substantial body of someone's work (their writing, their code, their decisions) and the model can answer in their style. Not perfectly. Not always. But often enough that the question stops being theoretical. (As covered in the style-is-not-knowledge piece a couple weeks back, capturing voice is not the same as capturing reasoning, but voice is the easier half, and it's the half that's now real.)

A demonstration: paste a few thousand words of someone's published writing alongside a question they haven't answered, and ask GPT-4 to answer in their voice. The output is rougher than the source, but the shape of how that person thinks is recognizable. That's the part that's new.

The hypothetical artifact

Given all three (conversational interface, local inference, style-capable models) what would the artifact actually look like?

A small additive layer on top of a base model. Trained on the body of work of one person. Encoding not just what they know but how they reason about problems in their domain. Distributable as a file. Runnable on commodity hardware. Queryable through the same interface you'd use for any chat assistant.

In implementation terms, roughly:

  • A base model (Llama-7B at the moment, probably better candidates by year-end)
  • A LoRA-style adapter trained on the author's corpus
  • A retrieval index over the original source material for grounding
  • A small system prompt that frames the persona

There are no commercial offerings of this. Some research projects gesture at it. The infrastructure to build it exists in pieces. The infrastructure to distribute and license it doesn't exist at all.

What the framework requires

For the artifact to be real, several things need to be true that aren't yet.

Fine-tuning of frontier models needs to be publicly accessible. OpenAI lets you fine-tune the older models. Not GPT-3.5 or GPT-4. The closed-frontier shops have signaled they'll get there but haven't.

Adapter formats need to be portable. Today, a fine-tuned model is locked to the platform that trained it. A LoRA you trained on Llama can't be applied to GPT. A LoRA you trained inside one vendor's tooling stays inside that vendor's tooling. There's no equivalent of an MP3 (a standardized portable container) for adapter weights.

Licensing terms need to exist. The terms of service for the major model providers and the open-weights ecosystem aren't written for "this artifact represents one person's expertise and they own the licensing rights." Until that legal infrastructure exists, the economic model can't.

Inference economics need to support the licensing model. Frontier inference at GPT-4 prices ($0.03 per 1K input: $0.06 per 1K output) makes "license a personal AI for $5/month" hard to pencil. Open-weights inference on local hardware gets the cost down at the cost of capability. The Pareto frontier here will move fast, but it isn't where it needs to be yet.

A testable version of the experiment

A reasonable experiment to start running, even with the gaps:

  1. Pick a body of work, a writer with a clear voice, a domain expert with a published archive.
  2. Fine-tune a base model on the corpus, optimizing for style and reasoning patterns rather than fact recall.
  3. Build an evaluation harness: hold out a few questions from the corpus, ask the fine-tuned model to answer in voice, and compare the output to the actual author's response.
  4. Vary the size of the corpus, the size of the base model, the technique used for adaptation, and observe where the answers stop sounding like the author and start sounding like generic LLM output.

The technical components for that experiment are mostly available. The hard part isn't the engineering. It's the evaluation. There's no good objective metric for "does this reason like the author would." That's a research problem worth working on.

The open questions

Three things this thought experiment doesn't answer:

Who'd buy the artifact? The market for "expert X's domain knowledge" is people who don't want to develop that expertise themselves. Those people mostly buy outcomes, not learning artifacts. Whether there's a real market for a queryable expertise object (versus contracting with the actual expert) is unclear.

What does ownership look like? If you train an adapter on your published writing and someone else uses it, who owns the resulting model? The author, the platform that hosted the training, the licensee? The legal frameworks for this don't exist. The ones that exist for adjacent cases (copyright on training data, EULAs on derivative models) don't map cleanly.

What's the half-life of an expertise artifact? vRA documentation from 2014 is mostly useless in 2023. If domain expertise is encoded in a model and the underlying technology drifts, when does the artifact stop being valuable? Is there a refresh model? Who pays for it?

The thought experiment is interesting because the technical preconditions just arrived. The commercial and legal preconditions are nowhere close. Worth watching which gap closes first.