Llama 2 changes the math on owning your model
Meta released Llama 2 last week with a permissive commercial license. Five months after the leak made local inference possible, the legal layer just caught up to the technical one.
Meta released Llama 2 on July 18th, with weights for the 7B, 13B, and 70B variants and a license that permits commercial use up to 700 million monthly active users. That last clause is the headline. The capability bump matters; the licensing change matters more.
Five months ago the situation was: a research-only license, leaked weights circulating on torrents, an open community that had to operate in a legal gray zone if it wanted to do anything beyond pure research. Today the situation is: an official, sanctioned, commercial-use-permitted release from one of the largest model labs on Earth, with the explicit goal of building an open ecosystem on top of it. Same shape of model, completely different posture for anyone who wanted to actually deploy it.
What's in the release
Three model sizes: 7B, 13B, and 70B parameters. (The 33B variant from Llama 1 is notably absent. Meta hasn't said why.) Each comes in two flavors: the base model (Llama-2-7B, -13B, -70B) and an instruction-tuned chat variant (Llama-2-7B-Chat, etc.).
The base models are pre-trained on roughly 2 trillion tokens, about 40% more than Llama 1. Context length is 4K (double the 2K of Llama 1). The chat variants have been instruction-tuned and RLHF'd by Meta directly, which is the first time we've gotten a major open-weights release that ships with a working chat experience out of the box.
Capability-wise, Meta's own benchmarks put Llama-2-70B-Chat in the same neighborhood as GPT-3.5 on most evaluations, better on some, worse on others. Independent benchmarks since the release have largely agreed. It's not GPT-4. It's competitive with what most production systems were running on last year. For a lot of use cases, that's enough.
Why the license matters more than the capability
The commercial-use clause is the part that changes the calculus. To walk through what it actually allows:
You can use Llama 2 in a commercial product. You can fine-tune it on your own data and ship the result. You can charge for access to a service built on top of it. You can host it on your own infrastructure or your customer's infrastructure. The 700M MAU threshold is a notice-and-license-additional-terms gate that affects roughly seven companies on Earth. For everyone else, the license is effectively unrestricted.
This was not true in February. In February, anyone building on the leaked weights was in a posture of "the license forbids this, but the lawyers haven't come for me yet." That posture is fine for hobbyists, hostile for serious deployments. Llama 2 cleans up the legal layer entirely. The open-weights ecosystem just got a foundation it can build on without looking over its shoulder.
The strategic logic for Meta is reasonably transparent. They aren't going to win the closed frontier model race against OpenAI and Anthropic, they were behind, the gap is widening, and the value proposition for them is unclear. They can, however, commoditize the layer below the frontier, deny their competitors a moat at that layer, and become the default open foundation everyone else builds on. That's the play. Whether it works depends on the next two or three releases.
What changes for someone considering "build vs. rent"
Take the practical question: you have a use case where you'd like AI capabilities, and you're trying to decide between calling an API and running a model yourself. Six months ago the calculation looked like:
- API: pay per token, fast time to value, your data goes to the provider, capability is best-in-class.
- Self-host: technical setup, capital cost for hardware, your data stays put, capability is meaningfully behind.
The "capability is meaningfully behind" part was the gating factor for most serious use cases. Self-hosting an open model from a year ago (GPT-J, GPT-NeoX, BLOOM) meant something well below GPT-3 quality, and not useful for much. With Llama 2 in the picture, the self-host option is now in the rough neighborhood of GPT-3.5, which covers a substantial fraction of real-world tasks.
So the new math is closer to:
- API: pay per token, your data goes to the provider, top-tier capability if you want it.
- Self-host: setup cost, hardware cost, data stays put, capability is one tier behind frontier but production-usable for many tasks.
For low-volume use, the API still wins on convenience. For high-volume use, the per-token cost compounds and self-hosting starts to look attractive even with the capability gap. For data-sensitive use, self-hosting is now a real option in a way it wasn't.
The interesting middle ground is hybrid: route the easy queries to a self-hosted Llama 2 and the hard ones to a frontier API. This is the pattern that's going to start showing up in serious production systems. The infrastructure for routing decisions ("is this query easy or hard?") is a research problem in its own right, early answers exist, none are reliable.
What the tooling looks like in two months
The open community has been waiting for this. The first wave of fine-tunes on Llama 2 will hit within days. The first quantizations (4-bit, 3-bit, GGML) will land within a week. The first integrations into the popular inference layers (llama.cpp, text-generation-webui, vLLM) will land basically immediately. The first commercial products built on Llama-2-70B-Chat will ship within a month.
What follows from that is harder to predict, but the rough shape is:
- A cluster of fine-tuned variants for specific verticals, coding (Code Llama variants will probably show up from the community before Meta does an official one), instruction-following, role-playing, etc.
- A handful of small companies building "self-hosted ChatGPT for your enterprise" products, with Llama 2 as the engine.
- A maturation of the local-inference tooling, installer scripts, hardware configuration guides, performance tuning recipes.
- Continued capability improvements as fine-tuning techniques mature.
The thing that won't happen, probably, is the open ecosystem catching up to the closed frontier in raw capability. Meta has signaled they're a year or more behind on the frontier. That gap might close partially over the next year, but the closed labs are also moving. The open ecosystem competes on a different vector: cost, customizability, data privacy, ownership.
What it means for the personal-AI thread
The encoding-a-person thought experiment from a few months back assumed an open base model with a permissive license, fine-tunable on a personal corpus, runnable locally. As of last week, all three pieces exist as a coherent package for the first time. The base model is Llama-2-13B (or 7B, depending on hardware). The fine-tuning recipe is documented. The local inference stack works.
The technical preconditions for the artifact in that thought experiment are now real. The legal preconditions are now real. The economic and distribution preconditions still aren't, there's no marketplace, no licensing infrastructure for adapters, no way for the person whose corpus is being encoded to participate in any value flow downstream. The earlier post on a marketplace for training data covered the gap on the supply side. The same gap exists on the artifact side.
So the picture as of late July 2023 is: you could build a personal-AI system today, end to end, and have it work in a basic form. You could not commercialize it cleanly because the licensing infrastructure for the artifact doesn't exist. You could not pay the source person fairly because the rev-share mechanics don't exist. The technical work is now mostly there. The institutional work is still mostly absent.
That's a useful place to be. The interesting problems shift from "can we build it" to "what should the institutional infrastructure look like." Different conversation, different set of people, different time horizons. The fact that the technical layer is settled enough to even ask the question is the news.
The capability bump in Llama 2 is real. The license change is the bigger deal. The combination is the foundation for a layer of the AI tooling that's been promised for two years and only now actually exists.