Context Engineering for PMs: The Skill After Prompts | PM Toolkit

In early 2026, researchers at ETH Zurich measured what happens when an LLM writes its own context file. Across eight benchmark settings, the auto-generated files lowered task success in five of them, raised inference costs by 20 to 23 percent, and added 2.45 to 3.92 extra reasoning steps per task¹². Files curated by humans did better for every one of the four agents tested.

The study covers coding agents, but the finding maps directly onto the skill PMs are now being asked to learn. Prompt engineering covered the per-request layer: structure one ask well, get one good answer. This guide covers the layer above it: what the model already sees before you type anything.

From prompts to context

Bharani Subramaniam of Thoughtworks puts the definition in one line: "Context engineering is curating what the model sees so that you get a better result."³

In 2025 you improved outputs by rewording the request. In 2026 most of the gain sits in the standing context: the files the model reads at session start, the workspace memory that persists across chats, the documents retrieval pulls in, the instructions that load only when a task needs them. A well-worded prompt on top of stale standing context still produces a confident answer built on wrong facts.

Four layers, from most ephemeral to most durable:

Layer	What loads	When it loads	Who curates it	Failure mode
Prompt	The request itself	Every message	You, per request	Vague ask, generic output
Workspace memory	Product facts, personas, formats (Claude Projects, ChatGPT Projects)	Every chat in that project	The PM	One stale doc poisons every answer
Always-on agent file	AGENTS.md / CLAUDE.md conventions	Session start	The whole team, via PRs	Bloat raises cost, lowers success
Skills and retrieval	Extra instructions, docs, scripts	When relevant to the task	Whoever owns that workflow	Never triggers, or triggers on the wrong task

Martin Fowler's site describes the bottom two layers for coding agents: guidance files like CLAUDE.md load at session start and carry the conventions that apply to the whole project, while skills point to extra resources, instructions, and scripts that load only when relevant³. The two layers PMs actually touch are the personal workspace and the team's agent file. The rest of this guide works through both.

Curation beats generation

The ETH Zurich numbers deserve a closer look, because the temptation they warn against ships as a feature. Most agent tools offer an init command that scans your repo and writes the context file for you in about thirty seconds. The study tested exactly those LLM-generated files and found they reduced task success in five of eight settings, with drops ranging from 0.5 percent on SWE-bench Lite to 2 percent on AGENTbench, while making every task cost more to run¹². Human-curated files outperformed the generated ones for all four agents tested, gaining roughly 4 percentage points on AGENTbench, though even the good files carried token overhead¹.

The Augment Code write-up of the study lands on one rule: write only what agents cannot discover on their own. Auto-generated files mostly restate what the agent already reads in the codebase and docs, so they add tokens without adding information. Architectural overviews and general bloat raise cost without raising success¹.

Most of the work is subtraction. A context file earns its tokens when it carries decisions and definitions that exist only in someone's head, and it wastes them when it repeats what is already on disk. Deciding what matters and what to leave out is judgment work, which is why this lands on the PM's desk and not only the engineer's.

Running your tool's init command and committing the output unedited is the documented anti-pattern. Use the generated file as a scaffold if you want, then delete every line the agent could find by reading the repo itself.

Your PM workspace

The same logic applies outside the repo. Productside's 2026 workflow guide describes the pattern as persistent AI workspaces: a Claude Project or ChatGPT Project that retains your product domain, research, personas, constraints, and preferred formats, so you stop re-explaining your world in every chat⁴.

Apply the ETH filter to what goes in. Only what the model cannot discover, and what you cannot afford to have wrong:

A product one-pager. What the product is, who pays, the current bet. About 300 words, not the 40-slide strategy deck.
Metric definitions with formulas. Your activation definition, your churn denominator, your fiscal calendar. "Activation = completed first sync within 7 days of signup" settles the argument before the model invents its own version.
Two or three personas, each with a verbatim quote from real research.
Named non-goals. "We do not serve enterprise. We do not build native mobile in 2026." Models pad scope by default; a non-goals list is the cheapest scope control you can write.
One or two voice samples. A doc you wrote that sounds like you, so drafts start closer to done.

What stays out: the strategy doc from two quarters ago, the dump of 60 raw transcripts, anything you would not personally vouch for today. A workspace with wrong context is worse than an empty one, because the errors come back wearing your own terminology.

If you work in Claude Cowork or any file-based setup, the folder-level _Context.md pattern in Getting the most out of Claude Cowork is the same idea at smaller scale: tell the model which files are source of truth and which to skip.

Your team's agent context files

The second surface lives in the repo. AGENTS.md is an open format for guiding coding agents, used by over 60,000 open-source projects. Its maintainers call it a README for agents: a dedicated, predictable place for the build steps, tests, and conventions that would clutter a README written for people⁵. The format came out of collaboration across OpenAI Codex, Amp, Google Jules, Cursor, and Factory, and is now stewarded by the Agentic AI Foundation under the Linux Foundation⁵. As of 2026 it is read by Claude Code, OpenAI Codex CLI, Cursor, Aider, Devin, GitHub Copilot, Gemini CLI, Windsurf, Amazon Q, and others⁶.

Engineers own most of this file: build commands, test instructions, code style. The PM contributes the part agents cannot infer from code: domain rules, shared vocabulary, and the reason behind a constraint. An agent reading your codebase can learn how the trial logic works; it cannot learn that the 30-day variant exists because sales-assisted deals need procurement time. A PM-owned section can be 10 lines:

## Product rules (PM-owned, ask before changing)
- "Workspace" = a billing entity. "Project" = a container inside one. Never swap these.
- Trials: 14 days self-serve, 30 days sales-assisted.
- Free plan never sees seat pricing. Legal constraint, not a UX choice.
- Dates display in the user's locale. Exports use ISO 8601.

BuildBetter's guide adds the operating practices: keep the file under 500 lines, because the whole thing loads into the context window and tokens are the budget; treat it as code, meaning the PR that changes a convention also updates the file; and keep the division of labor clean, with the README staying narrative and AGENTS.md staying imperative⁶.

If you prototype with agents yourself, you have already brushed against these files. From PRD to prototype covers that workflow.

AGENTS.md and CLAUDE.md do the same job. AGENTS.md is the vendor-neutral format; CLAUDE.md is Claude Code's native guidance file, loaded at session start³. Many repos carry both. The format matters less than the curation, and the ETH result applies to either.

Maintenance: context rots

A context file is correct on the day you write it. Pricing changes, a persona retires, the non-goal becomes this year's bet, and the file keeps asserting the old world with full confidence. Stale context fails quietly, which makes it worse than missing context.

The fix borrows from how engineers handle docs: couple updates to changes. When pricing changes, the workspace one-pager changes the same week. When a repo convention changes, AGENTS.md changes in the same PR⁶. The update is part of the change, not a follow-up task that never happens.

We have no data on the right review cadence, so here is the intuition. A 10-minute monthly skim of your workspace files catches most rot, and the PR-coupling rule handles the repo side without needing a calendar at all. Add one symptom-driven trigger: when the model starts referencing a killed feature or an outdated metric definition, that is the context file talking. Fix the source, not the chat.

A quick rot test: open a fresh chat in your workspace and ask "what do you know about this product?" Read the answer as an audit. Whatever it gets wrong, your context file gets wrong.

FAQ

What is context engineering? Curating what the model sees so you get a better result, in Thoughtworks' Bharani Subramaniam's definition³. It covers the standing layers around any single prompt: workspace memory, project files like AGENTS.md, retrieved documents, and instructions that load on demand.

Does context engineering replace prompt engineering? No. The prompt remains the per-request layer, and the 2026 rules for it still hold: ground factual claims, skip boilerplate for reasoning models. Context engineering decides what surrounds the request. A good prompt inside bad context produces fluent answers built on wrong facts.

Should I auto-generate my AGENTS.md? Not as the finished product. In ETH Zurich's benchmarks, LLM-generated context files reduced success in five of eight settings and raised inference costs 20 to 23 percent, while human-curated files beat them for all four agents tested¹². Generate a scaffold if it helps, then cut everything the agent could discover by reading the repo.

What is the difference between AGENTS.md and CLAUDE.md? AGENTS.md is the vendor-neutral format read by Claude Code, Codex CLI, Cursor, Copilot, and most other agents⁶. CLAUDE.md is Claude Code's own guidance file, loaded at session start³. Same job, different scope of tooling. Many repos keep both in sync.

How long should a context file be? Under 500 lines for AGENTS.md, since the file loads into the context window on every session⁶. The ETH study argues for shorter still: every line that repeats discoverable information costs tokens without buying success¹. For a PM workspace, a 300-word one-pager plus metric definitions beats a document dump.

Context Engineering for PMs: The Skill After Prompt Engineering

From prompts to context

Curation beats generation

Your PM workspace

Your team's agent context files

Maintenance: context rots

FAQ

Sources

From prompts to context

Curation beats generation

Your PM workspace

Your team's agent context files

Maintenance: context rots

FAQ

Sources

Footnotes