Context Engineering for PMs: The Skill After Prompt Engineering

Prompt engineering tunes one request. Context engineering curates everything the model sees: workspaces, memory, agent files. What to put in, what to cut, and who owns it.

By Prateek Jain
10 min readIntermediate

In early 2026, researchers at ETH Zurich measured what happens when an LLM writes its own context file. Across eight benchmark settings, the auto-generated files lowered task success in five of them, raised inference costs by 20 to 23 percent, and added 2.45 to 3.92 extra reasoning steps per task12. Files curated by humans did better for every one of the four agents tested.

The study covers coding agents, but the finding maps directly onto the skill PMs are now being asked to learn. Prompt engineering covered the per-request layer: structure one ask well, get one good answer. This guide covers the layer above it: what the model already sees before you type anything.

From prompts to context

Bharani Subramaniam of Thoughtworks puts the definition in one line: "Context engineering is curating what the model sees so that you get a better result."3

In 2025 you improved outputs by rewording the request. In 2026 most of the gain sits in the standing context: the files the model reads at session start, the workspace memory that persists across chats, the documents retrieval pulls in, the instructions that load only when a task needs them. A well-worded prompt on top of stale standing context still produces a confident answer built on wrong facts.

Four layers, from most ephemeral to most durable:

LayerWhat loadsWhen it loadsWho curates itFailure mode
PromptThe request itselfEvery messageYou, per requestVague ask, generic output
Workspace memoryProduct facts, personas, formats (Claude Projects, ChatGPT Projects)Every chat in that projectThe PMOne stale doc poisons every answer
Always-on agent fileAGENTS.md / CLAUDE.md conventionsSession startThe whole team, via PRsBloat raises cost, lowers success
Skills and retrievalExtra instructions, docs, scriptsWhen relevant to the taskWhoever owns that workflowNever triggers, or triggers on the wrong task

Martin Fowler's site describes the bottom two layers for coding agents: guidance files like CLAUDE.md load at session start and carry the conventions that apply to the whole project, while skills point to extra resources, instructions, and scripts that load only when relevant3. The two layers PMs actually touch are the personal workspace and the team's agent file. The rest of this guide works through both.

Curation beats generation

The ETH Zurich numbers deserve a closer look, because the temptation they warn against ships as a feature. Most agent tools offer an init command that scans your repo and writes the context file for you in about thirty seconds. The study tested exactly those LLM-generated files and found they reduced task success in five of eight settings, with drops ranging from 0.5 percent on SWE-bench Lite to 2 percent on AGENTbench, while making every task cost more to run12. Human-curated files outperformed the generated ones for all four agents tested, gaining roughly 4 percentage points on AGENTbench, though even the good files carried token overhead1.

The Augment Code write-up of the study lands on one rule: write only what agents cannot discover on their own. Auto-generated files mostly restate what the agent already reads in the codebase and docs, so they add tokens without adding information. Architectural overviews and general bloat raise cost without raising success1.

Most of the work is subtraction. A context file earns its tokens when it carries decisions and definitions that exist only in someone's head, and it wastes them when it repeats what is already on disk. Deciding what matters and what to leave out is judgment work, which is why this lands on the PM's desk and not only the engineer's.

Your PM workspace

The same logic applies outside the repo. Productside's 2026 workflow guide describes the pattern as persistent AI workspaces: a Claude Project or ChatGPT Project that retains your product domain, research, personas, constraints, and preferred formats, so you stop re-explaining your world in every chat4.

Apply the ETH filter to what goes in. Only what the model cannot discover, and what you cannot afford to have wrong:

  • A product one-pager. What the product is, who pays, the current bet. About 300 words, not the 40-slide strategy deck.
  • Metric definitions with formulas. Your activation definition, your churn denominator, your fiscal calendar. "Activation = completed first sync within 7 days of signup" settles the argument before the model invents its own version.
  • Two or three personas, each with a verbatim quote from real research.
  • Named non-goals. "We do not serve enterprise. We do not build native mobile in 2026." Models pad scope by default; a non-goals list is the cheapest scope control you can write.
  • One or two voice samples. A doc you wrote that sounds like you, so drafts start closer to done.

What stays out: the strategy doc from two quarters ago, the dump of 60 raw transcripts, anything you would not personally vouch for today. A workspace with wrong context is worse than an empty one, because the errors come back wearing your own terminology.

Your team's agent context files

The second surface lives in the repo. AGENTS.md is an open format for guiding coding agents, used by over 60,000 open-source projects. Its maintainers call it a README for agents: a dedicated, predictable place for the build steps, tests, and conventions that would clutter a README written for people5. The format came out of collaboration across OpenAI Codex, Amp, Google Jules, Cursor, and Factory, and is now stewarded by the Agentic AI Foundation under the Linux Foundation5. As of 2026 it is read by Claude Code, OpenAI Codex CLI, Cursor, Aider, Devin, GitHub Copilot, Gemini CLI, Windsurf, Amazon Q, and others6.

Engineers own most of this file: build commands, test instructions, code style. The PM contributes the part agents cannot infer from code: domain rules, shared vocabulary, and the reason behind a constraint. An agent reading your codebase can learn how the trial logic works; it cannot learn that the 30-day variant exists because sales-assisted deals need procurement time. A PM-owned section can be 10 lines:

## Product rules (PM-owned, ask before changing) - "Workspace" = a billing entity. "Project" = a container inside one. Never swap these. - Trials: 14 days self-serve, 30 days sales-assisted. - Free plan never sees seat pricing. Legal constraint, not a UX choice. - Dates display in the user's locale. Exports use ISO 8601.

BuildBetter's guide adds the operating practices: keep the file under 500 lines, because the whole thing loads into the context window and tokens are the budget; treat it as code, meaning the PR that changes a convention also updates the file; and keep the division of labor clean, with the README staying narrative and AGENTS.md staying imperative6.

If you prototype with agents yourself, you have already brushed against these files. From PRD to prototype covers that workflow.

Maintenance: context rots

A context file is correct on the day you write it. Pricing changes, a persona retires, the non-goal becomes this year's bet, and the file keeps asserting the old world with full confidence. Stale context fails quietly, which makes it worse than missing context.

The fix borrows from how engineers handle docs: couple updates to changes. When pricing changes, the workspace one-pager changes the same week. When a repo convention changes, AGENTS.md changes in the same PR6. The update is part of the change, not a follow-up task that never happens.

We have no data on the right review cadence, so here is the intuition. A 10-minute monthly skim of your workspace files catches most rot, and the PR-coupling rule handles the repo side without needing a calendar at all. Add one symptom-driven trigger: when the model starts referencing a killed feature or an outdated metric definition, that is the context file talking. Fix the source, not the chat.

FAQ

What is context engineering? Curating what the model sees so you get a better result, in Thoughtworks' Bharani Subramaniam's definition3. It covers the standing layers around any single prompt: workspace memory, project files like AGENTS.md, retrieved documents, and instructions that load on demand.

Does context engineering replace prompt engineering? No. The prompt remains the per-request layer, and the 2026 rules for it still hold: ground factual claims, skip boilerplate for reasoning models. Context engineering decides what surrounds the request. A good prompt inside bad context produces fluent answers built on wrong facts.

Should I auto-generate my AGENTS.md? Not as the finished product. In ETH Zurich's benchmarks, LLM-generated context files reduced success in five of eight settings and raised inference costs 20 to 23 percent, while human-curated files beat them for all four agents tested12. Generate a scaffold if it helps, then cut everything the agent could discover by reading the repo.

What is the difference between AGENTS.md and CLAUDE.md? AGENTS.md is the vendor-neutral format read by Claude Code, Codex CLI, Cursor, Copilot, and most other agents6. CLAUDE.md is Claude Code's own guidance file, loaded at session start3. Same job, different scope of tooling. Many repos keep both in sync.

How long should a context file be? Under 500 lines for AGENTS.md, since the file loads into the context window on every session6. The ETH study argues for shorter still: every line that repeats discoverable information costs tokens without buying success1. For a PM workspace, a 300-word one-pager plus metric definitions beats a document dump.

Sources

Footnotes

  1. How to Build Your AGENTS.md (2026), Augment Code 2 3 4 5 6

  2. ETH Zurich study on agent context files, arXiv:2602.11988 2 3

  3. Context Engineering for Coding Agents, martinfowler.com 2 3 4 5

  4. The AI Product Management Workflows 2026, Productside

  5. AGENTS.md, agents.md 2

  6. AGENTS.md Complete Guide for Engineering Teams in 2026, BuildBetter 2 3 4 5