Prompt Engineering for PMs: A 2026 Guide | PM Toolkit

Q: How To Improve Your Prompts?

**Run the eval, not the vibe.** Build a 20-example golden set for any prompt you'll reuse. Track accuracy on every prompt revision. **Save what works.** Keep a personal prompt library in a Notion doc, an Obsidian vault, or our `/prompts` page. Tag by use case. **Steal from production.** When a vendor ships a new agent (Cursor, Claude Code, Devin), look at their system prompts (often public or leaked). Working production prompts beat blog post examples. **Strip, don't stack.** When a prompt isn't

What Changed Between 2025 and 2026

Prompt engineering was a craft in 2025. In 2026, it's mostly hygiene.

Three reasons:

1. Models follow intent better. Claude 4.7, GPT-5.5, and Gemini 3.1 Pro all parse vague requests into reasonable answers. The "be a helpful assistant who" preamble stopped helping a year ago.

2. Reasoning models do their own thinking. When you use Claude with extended thinking, GPT-5 in thinking mode, or Gemini 3.1 Pro Deep Think, the model runs its own chain-of-thought. The "let's think step by step" trick became redundant.

3. Tool use and structured outputs got better. Most product work needs the model to call a tool or return JSON. Those features matter more than prompt phrasing.

What still matters: clear context, grounding, and knowing when to skip the framework.

The CRISP Framework

When the stakes are high or the model is non-reasoning, CRISP still earns its keep.

Context: who you are, what the product is. Role: who the model should play. Instructions: what you want done. Specifics: format, length, constraints. Path: how to handle edge cases.

A bad prompt:

Help me with this PRD

A CRISP prompt:

Context: PM at a B2B SaaS in early-stage growth.
Role: Senior PM coach with 10 years of experience.
Instructions: Review the PRD below. Find three weaknesses.
Specifics: One sentence per weakness. Cite the section.
Path: If a section is missing entirely, say so.

[PASTE PRD]

The bad prompt produces generic feedback. The CRISP prompt produces actionable critique. Time difference: 30 seconds. Output difference: large.

When to Skip CRISP

In 2026, three cases where CRISP is overkill.

1. Reasoning models on hard problems. If you're using Claude with extended thinking or GPT-5 thinking mode for analysis, just state the question. The model will plan its approach internally. CRISP boilerplate gets ignored.

2. Quick lookups. "Summarize this email" does not need a role and a path. Just ask.

3. Tool-heavy workflows. When the model is calling tools (web search, code execution, MCP servers), the tool definitions carry the contract. The natural-language prompt becomes a thin layer of intent.

A useful heuristic: if the prompt is for a one-shot task and you'll scan the answer in seconds, skip CRISP. If it's a high-stakes prompt you'll save and reuse, use CRISP.

The Hallucination-Aware Pattern

Reasoning models hallucinate more, not less. o3 hits 33%. o4-mini hits 48%. Every reasoning model tested in 2026 crossed 10%¹.

This is the contrarian fact for prompt engineering: a smarter model does not mean a more truthful one.

Three lines that collapse hallucination risk:

- Cite verbatim quotes from the input for every claim.
- If you cannot find evidence, say "no clear evidence."
- Do not paraphrase or invent.

Add these to any prompt where the cost of being wrong is real. Research synthesis, customer feedback analysis, document Q&A, anything cited in a meeting.

On grounded summarization with these instructions, top models hit 0.7-1.5% hallucination rates¹. Without them, the same models hit 15-50%.

That's a one-line difference in your prompt.

Five Templates That Work

The templates I actually reuse, week after week.

1. PRD Review

Context: PM at [STAGE] [PRODUCT TYPE].
Role: Senior PM coach.
Instructions: Review the PRD below. Find:
- Three weaknesses with section reference
- Three missing pieces
- One question I should ask before shipping

[PASTE PRD]

2. Feedback Synthesis

Context: Analyzing customer feedback for [PRODUCT].
Instructions: Find the top 3 complaints, top 3 requests, and one non-obvious pattern.
Format: Bullet list with verbatim quotes and counts.
Grounding: Every claim must cite a verbatim quote from the input.
If you cannot find evidence, say so.

[PASTE FEEDBACK]

3. Competitive Teardown

Context: PM at [PRODUCT]. Competitor is [COMPETITOR URL].
Instructions: Read the competitor pages I'll link or paste. Identify:
- Three things they do better than us
- Three things we do better
- One opportunity their positioning leaves open

Cite the page or quote for each point.

[PASTE OR LINK]

4. Eval Design

Context: Need an eval for [AI FEATURE].
Instructions: Help me design a 50-example golden set.
Specifics:
- What dimensions should the examples cover?
- What edge cases should I include?
- What metric matches this task type?
- Show me 5 example entries in the format I should use.

5. Stakeholder Update

Context: [STAKEHOLDER] asks about [METRIC] in 30 minutes.
Data: [LAST 7-30 DAYS]

Prepare 5 talking points, one sentence each:
- Current value with trend
- Biggest issue
- What we're doing about it
- Expected impact of the fix
- One impressive insight

Confident tone. No hedging. Cite the data.

Tool Use: The Real 2026 Skill

The biggest gains in 2026 do not come from prompt phrasing. They come from connecting the model to real systems.

Three patterns to know:

Web search. Claude, ChatGPT, and Gemini all ship web search. For any question that depends on current info, enable it. The prompt stays simple.

MCP servers. The Model Context Protocol passed 97M monthly SDK downloads by March 2026². Connect Claude or ChatGPT to your databases, your internal APIs, your filesystem. Your prompt becomes "do this work" and the model figures out which tools to call.

Structured outputs. Both Anthropic and OpenAI SDKs let you specify a JSON schema for the response. The model returns valid JSON every time, no parsing logic. This is essential for any prompt that feeds a downstream system.

If your prompts are doing string manipulation that a tool could do, switch to a tool. The model is better at calling tools than at simulating them.

Common Mistakes

1. Prompt theater. Long preambles like "you are a world-class expert who is extremely meticulous." Wastes tokens. Models ignore it.

2. Telling reasoning models to think step by step. Redundant. They already do.

3. No grounding. Skipping the verbatim-citation instruction on factual tasks. Pay this in hallucination later.

4. Too many instructions. A 12-rule prompt. Models follow the first 3 well, the rest get fuzzy. Cut to 3-5.

5. Not iterating on the eval, only the prompt. You change the prompt, the answer changes, you call it "better." Without an eval you do not know.

How To Improve Your Prompts

Run the eval, not the vibe. Build a 20-example golden set for any prompt you'll reuse. Track accuracy on every prompt revision.

Save what works. Keep a personal prompt library in a Notion doc, an Obsidian vault, or our /prompts page. Tag by use case.

Steal from production. When a vendor ships a new agent (Cursor, Claude Code, Devin), look at their system prompts (often public or leaked). Working production prompts beat blog post examples.

Strip, don't stack. When a prompt isn't working, default to removing instructions, not adding them. Confusion compounds.

Action Plan

This week: pick three prompts you reuse. Add CRISP structure where the stakes are high. Add grounding lines where facts matter.

This month: build a 20-example eval for your most-used prompt. Run it every time you change the prompt.

This quarter: move two of your manual workflows from "long prompt" to "agent with tools." The payoff is bigger.

Prompt Library for the full template catalog
Building Agentic Products for tool use and orchestration
AI for User Research Synthesis for grounded prompting

Prompt Engineering for Product Work: A 2026 Guide

Prerequisites

What Changed Between 2025 and 2026

The CRISP Framework

When to Skip CRISP

The Hallucination-Aware Pattern

Five Templates That Work

1. PRD Review

2. Feedback Synthesis

3. Competitive Teardown

4. Eval Design

5. Stakeholder Update

Tool Use: The Real 2026 Skill

Common Mistakes

How To Improve Your Prompts

Action Plan

Sources

Prerequisites

What Changed Between 2025 and 2026

The CRISP Framework

When to Skip CRISP

The Hallucination-Aware Pattern

Five Templates That Work

1. PRD Review

2. Feedback Synthesis

3. Competitive Teardown

4. Eval Design

5. Stakeholder Update

Tool Use: The Real 2026 Skill

Common Mistakes

How To Improve Your Prompts

Action Plan

Related

Sources

Footnotes