The 70/30 rule for prompt vs context
After tuning a lot of prompts across my own workflows and reading deeply across the public corpus, the ratio that keeps holding is roughly 30% prompt instructions to 70% retrieved context. The opposite ratio is what most people start with. Worth being plain about why and when to break the rule.
Here's a pattern I keep noticing in the prompts I write and the ones I read. The ratio between instruction tokens (the prompt, what you're telling the model to do, the persona, the constraints, the format) and context tokens (the data, retrieved documents, conversation history, prior turns the model needs) tends to land somewhere around 30% prompt and 70% context. It's not a strict ratio, but it's a consistent shape that emerges as prompts mature.
The interesting thing is that most teams start with the opposite ratio. The first version of any prompt is heavy on instructions, light on context. The mature version is the inverse. Let me walk through why, and when to break the rule.
prompt-context-ratio
What the ratio actually means
For a typical query that gets sent to an LLM in production:
- Prompt tokens (~30%) cover the instruction layer: who the model is, what it's supposed to do, what format the output should be in, what to avoid, what to escalate, what tools are available.
- Context tokens (~70%) cover the data layer: retrieved chunks from a knowledge base, the user's prior conversation history, structured data the model is supposed to reason over, examples of what good output looks like.
The user's actual query (the thing they typed) is usually a small fraction of either bucket. The bulk is the assembly that makes the model useful for the actual workload.
Why the first version is inverted
The default mental model when you start writing prompts is "the prompt is what you tell the model to do." That mental model produces 80% / 20% in favor of instructions. You write a long, careful prompt. You give it lots of guidance. You add a small amount of context, maybe a recent message or a single retrieved chunk. You ship it. It works on the easy cases.
Then it falls over on the hard cases. You add more instructions. The prompt grows. You add edge-case handling, format guards, anti-pattern warnings. The prompt gets to 1500 tokens of careful instruction. The model starts ignoring parts of it. You're optimizing the wrong dimension.
The thing that actually produces better output on the hard cases is more relevant context, not more instructions. Adding a retrieved chunk that contains the answer is worth more than adding three more sentences of instruction about how to find the answer. The 70/30 ratio is what emerges from teams revising in this direction over months.
Why context wins
Three reasons relevant context beats more instruction.
Models are better at "use this material" than "follow these rules." Modern frontier models are well-trained on the pattern of "here's the source material, produce output grounded in it." They're less reliable at "follow these 47 rules across all queries." The architecture and training favor pulling from context over rule-following.
Context is workload-specific; instructions are workload-general. The instructions you write apply to every query. Context is loaded per query and matches the actual task. The signal-to-noise of context is higher because it's relevant to the specific case; the signal-to-noise of instructions degrades as you add edge-case handling that doesn't apply most of the time.
Context fails visibly; instruction-following fails silently. When the retrieved context doesn't contain the answer, the model usually says so or hedges. When the model ignores an instruction, it just ignores it and produces output that looks fine. The visible failure mode is easier to debug; the silent one piles up quiet wrongness.
What the 30% of instruction needs to do
The instructions in the mature prompt are doing specific work, not general guidance:
- Role and scope. "You are a customer-support agent for X. You handle Y. You don't handle Z."
- Output format. "Return JSON with these fields. If you can't, return an error in this shape."
- Tool surface. "You have these tools available. Call them with these argument shapes."
- Failure handling. "If the context doesn't contain the answer, say so. Don't speculate."
- Persona constraints. "Tone: direct. Length: under 200 words. Don't apologize."
These are the things instructions do well. They're general enough to apply to every query and specific enough that the model can follow them reliably. Each instruction earns its place by being the kind of constraint that doesn't change query-to-query.
What the 70% of context needs to do
The context in the mature prompt is doing the workload-specific work:
- Retrieved knowledge that the model needs to answer the specific query. A few well-chosen chunks beat many marginal ones.
- Relevant conversation history scoped to what the agent needs to know about the prior turns. Not the whole conversation; the parts that matter for this turn.
- Examples of good output in the specific shape you want, few-shot for the patterns the model needs to imitate.
- Structured data the model needs to reason over, the contract being analyzed, the data table being queried, the issue being triaged.
The 70% of context is what makes the model's output match the specific situation rather than producing a generic-shaped response.
When to break the 70/30 rule
Three cases where the ratio shifts on purpose.
Pure-reasoning tasks with no external data. Math problems, logic puzzles, anything where the model is reasoning from first principles. Context is small or zero; instruction can be larger. 50/50 or even prompt-heavy is fine.
Highly-constrained format generation. When the output has to follow an extremely strict schema (legal documents, structured XML, code in a specific style), the instruction layer needs to grow to spell out the constraints fully. 50/50 is reasonable.
Initial prototyping. When you're figuring out what the workload needs, instruction-heavy is fine because you're working on instructions. The 70/30 is what the prompt converges to after the workload is understood, not where it starts.
For everything else, the ratio holds.
How to tell if you're inverted
A few signals that suggest your prompt is over-instructed and under-contextualized:
- You keep adding rules. Every edge case becomes a new sentence of instruction. The prompt is over 1500 tokens and growing.
- The model ignores parts of the prompt. It doesn't follow the instruction about format, or the instruction about scope, or the instruction about not speculating. The instruction is technically there; the model is treating it as advisory.
- Output quality is inconsistent across similar queries. The model handles some queries well and similar ones poorly, and the difference doesn't correlate with anything in the input, which usually means the model isn't getting workload-specific context that would help.
- You've added few-shot examples and they help a lot. This is a signal that the model needs more example-based grounding than rule-based grounding for your workload.
If two or more of these are true, the prompt is probably inverted. Adding context (better retrieval, more relevant chunks, better conversation-history scoping) usually fixes more than adding more rules.
The cost dimension
The 70/30 ratio also matters for cost modeling. Context tokens are usually input tokens; instruction tokens are also input tokens. Both are billed at the input rate. The instruction tokens repeat on every call (prompt-caching helps); the context tokens vary per call. As you tune, prompt caching makes the instruction layer effectively free across queries; the context layer is the variable cost. The 70/30 ratio is therefore mostly about where the cost is actually going, and the answer is "context, on every call, growing with retrieval depth."
The teams that do conversation-level cost tracking see this directly: the context-heavy prompts dominate the bill, and the lever is "retrieve fewer / better chunks" not "shorten the instruction prompt."
The pattern across the discipline
The 70/30 ratio is one of the patterns the prompt-architecting framing points at, the work of designing how prompts compose into systems. Teams that treat prompts as careful documents to be written once converge on instruction-heavy. Teams that treat prompts as systems to be assembled per call converge on context-heavy. The latter is the production-grade pattern.
It's not a magic number. It's a converged shape. Most mature prompts I read in public sources end up somewhere between 60/40 and 80/20 in favor of context. 70/30 is a useful starting hypothesis to challenge; the actual ratio for your workload is something you'll find through measurement, not from a thinkpiece.
The takeaway worth remembering: when the prompt isn't doing what you want, the instinct is to add more instruction. The better move is usually to add more (or better) context. Inverted prompt-vs-context is the most common shape of production-prompt failure I've seen, and it's the one that's easiest to fix once you see it.