**Which users?** - Be specific about who - Good: "new users in their first week" - Bad: "our users" (too vague) **Do what differently?** - Something you can measure - Good: "complete their profile" - Bad: "be happier" (can't measure) **Why we think so?** - Based on research or data - Good: "interviews showed they're confused about what to share" - Bad: "seems like a good idea" (just a guess) **What metric changes?** - Specific number and timeframe - Good: "profile completion goes from 40% to 60%

Hypothesis-Driven Development: Complete Guide for PMs | PM Toolkit

Most features that ship never hit their goal. The usual reason is nobody tested the assumption first.

The Problem

Your team builds a new feature. Three months of work. It launches and barely anyone uses it.

In the retrospective, you discover everyone had different ideas about what success looked like. No one knew how to measure if it worked.

This happens all the time. Most new features don't achieve their goals¹. The problem isn't bad ideas. It's building without testing assumptions first.

You wouldn't fly a plane without checking the instruments. Don't build features without testing your assumptions.

Quick Start Guide

New to hypothesis-driven development? Start here:

Write your idea as a testable hypothesis - Use the template below
Define what success looks like - Pick measurable goals
Run the smallest possible test - Don't build everything first
Measure the results - Did it work or not?
Learn and iterate - A failed test is still a finished test

That's the basics. The rest is detail.

The Solution: Test Before You Build

Instead of building on assumptions, test them first.

Hypothesis-driven development forces you to write down what you think will happen, define what counts as success, and check your work against real data instead of opinion.

The Four Steps

Write a hypothesis - A clear prediction about what will happen
Set success metrics - Numbers that prove if you're right
Design a small test - The simplest way to get an answer
Document what you learn - What happened and what's next

The Simple Hypothesis Template

Use this template for every feature idea:

We believe [which users]
Will [do what differently]
Because [why we think so]
We'll know this works when [what metric changes]

How to Fill It In

Which users?

Be specific about who
Good: "new users in their first week"
Bad: "our users" (too vague)

Do what differently?

Something you can measure
Good: "complete their profile"
Bad: "be happier" (can't measure)

Why we think so?

Based on research or data
Good: "interviews showed they're confused about what to share"
Bad: "seems like a good idea" (just a guess)

What metric changes?

Specific number and timeframe
Good: "profile completion goes from 40% to 60% in 2 weeks"
Bad: "more engagement" (too vague)

A Simple Example

Feature idea: Add progress bar to onboarding

Hypothesis: "We believe new users signing up this week will complete onboarding more often because they'll see how close they are to finishing. We'll know this works when completion rate increases from 40% to 55%."

Real-World Examples

Airbnb: Professional Photos

Their hypothesis: "We believe hosts in New York will get more bookings if we provide professional photography because better photos increase trust."

The test: Offered free professional photography to some NYC hosts.

What happened: Listings with pro photos got 2.5x more bookings².

What this means for you: Quality visuals matter. Test improving your product images or screenshots.

Spotify: Personalized Playlists

Their hypothesis: "Users who haven't found new music recently will listen more if we create personalized playlists for them."

The test: Built a simple recommendation engine for a small group.

What happened: A large share of users streamed the whole playlist through.

What this means for you: Personalization works. Test customizing content for different user groups.

Amazon: 1-Click Ordering

Their hypothesis: "Repeat customers will buy more if we make checkout faster."

The test: Added a "Buy with 1-Click" button for some repeat customers³.

What happened: Those customers checked out more often and bought more.

What this means for you: Removing even one step can dramatically improve results. Test simplifying your key user flows.

LinkedIn: Suggested Connections

Their hypothesis: "Users with few connections will add more if we suggest people they might know."

The test: Showed connection suggestions to users with fewer than 50 connections.

What happened: Connection requests increased significantly.

What this means for you: Help users get started. Test ways to guide new users through first steps.

Four Types of Hypotheses

1. Problem Hypothesis

What you're testing: Does this problem actually exist?

Example: "Freelance designers struggle to track their expenses"

How to test: Talk to users, send surveys, check support tickets

2. Solution Hypothesis

What you're testing: Will your solution fix the problem?

Example: "Automatic expense tracking saves designers 2 hours per week"

How to test: Build a prototype, do usability testing

3. Growth Hypothesis

What you're testing: Will this improve your metrics?

Example: "Adding share buttons increases referrals by 30%"

How to test: Run an A/B test, compare user groups

4. Business Hypothesis

What you're testing: Will people pay for this?

Example: "20% of active users will pay $10/month for premium features"

How to test: Test pricing, survey willingness to pay

How to Define Success

Every hypothesis needs clear success metrics defined upfront.

Pick Your Metrics

Primary Metric - The main thing you're measuring

Must directly relate to your hypothesis
Example: "Onboarding completion rate"

Secondary Metrics - Other things that might improve

Support your main metric
Example: "Time to complete", "User satisfaction"

Guard Rails - Things that shouldn't get worse

Protect against negative side effects
Example: "Don't increase support tickets"

Before You Start Testing

Decide these numbers first:

How many users you need - Use our calculator below
How confident you want to be - Usually 95%
What improvement matters - Don't celebrate tiny changes
How long to run the test - Usually 1-2 weeks minimum

Calculate Your Sample Size

Interactive Calculator

Quick Example

Feature idea: Add social sharing buttons to order confirmation page

Hypothesis: "We believe customers will share their purchases because they want to help friends find deals. We'll know this works when 5% of customers share."

Test details:

Need: 3,000 completed orders
Duration: 2 weeks
Result: Only 3% shared

What we learned: Just adding buttons isn't enough. People need an incentive to share. Next test: offer discount for sharing.

Six Common Mistakes to Avoid

1. Writing Vague Hypotheses

The mistake: "This will increase engagement"

Why it's bad: You can't measure "engagement" clearly.

Do this instead: "Daily active users will comment 20% more within 30 days"

2. Only Seeing What You Want

The mistake: Ignoring data that proves you wrong.

Why it's bad: You'll build features that don't work.

Do this instead: Write down what failure looks like before you start.

3. Changing the Rules Mid-Game

The mistake: Moving success criteria after seeing results.

Why it's bad: You're fooling yourself.

Do this instead: Document your hypothesis somewhere you can't edit it.

4. Testing Everything at Once

The mistake: Changing the design, copy, and button color together.

Why it's bad: You won't know what actually worked.

Do this instead: Test one thing at a time.

5. Hiding Failed Tests

The mistake: Treating failed hypotheses as failures.

Why it's bad: A buried test teaches no one anything.

Do this instead: Share what you learned: "We thought X, but learned Y, so now we'll try Z."

6. Building Everything First

The mistake: Developing the full feature before testing anything.

Why it's bad: Waste months building the wrong thing.

Do this instead: Start with the smallest possible test.

How to Design Simple Tests

Start With the Smallest Test Possible

1. Fake Door Test Add a button for a feature that doesn't exist yet. See how many people click it.

2. Manual First Do it manually for a few users before building automation.

3. Wizard of Oz Make it look automated, but do the work manually behind the scenes.

4. Simple Prototype Build a clickable mockup before writing any code.

Match Test Size to Risk

Big risk + Not sure it'll work? → Run a tiny test (survey, fake door)

Small risk + Pretty confident? → Run a bigger test (A/B test with real feature)

Hard to undo later? (like pricing changes) → Test extra carefully

Easy to reverse? (like button colors) → Test quickly and move on

Speed Beats Perfection

Run 10 quick tests instead of 1 perfect test
Fail fast and learn quickly
Get directional answers first, refine later

AI Prompts for Hypothesis Testing

Use these with ChatGPT or Claude to help with your hypotheses:

Write a Hypothesis (Simple)

Turn this feature idea into a hypothesis:
[describe your feature]

Use this format:
"We believe [which users] will [do what] 
because [why]. We'll know this works when [metric changes]."

Write a Hypothesis (Detailed)

Convert this feature into a testable hypothesis:
[paste feature description]

Include:
- Specific user segment
- Measurable behavior change
- User need reasoning
- Clear success metrics
- Experiment design

Format: "We believe [users] will [behavior] because [reasoning]. 
We'll know when [metric]."

Define Success Metrics

What metrics should I track for this hypothesis:
[paste your hypothesis]

Suggest:
- 1 primary metric
- 2-3 secondary metrics  
- 2-3 things to watch that shouldn't get worse
- Expected timeframe

Design a Test

How should I test this hypothesis:
[paste your hypothesis]

Recommend:
- The smallest possible test
- How many users I need
- How long to run it
- What counts as success or failure

Building a Testing Culture

Document Everything

Keep a simple spreadsheet with:

What you hypothesized
What success looked like
What actually happened
What you learned
What you'll try next

Weekly: Share what tests you ran and what you learned Monthly: Look for patterns across all tests Quarterly: Adjust strategy based on learnings

Celebrate Failed Tests

A failed test still answers the question. Write down what you learned and move on. Share it: "We thought X would work, but learned Y instead, so now we're trying Z."

Your Pre-Test Checklist

Complete before every test:

✓ Hypothesis written - Using the template ✓ Success defined - Clear metrics chosen ✓ Sample size calculated - Know how many users you need ✓ Test designed - Smallest possible version ✓ Duration set - Know when to check results ✓ Learning plan ready - Know what you'll do with results

Next Steps

Start testing your hypotheses today:

Calculate sample size with our sample size calculator
Analyze test results with our A/B test calculator
Track conversions with our conversion rate calculator
Monitor retention with our retention analytics

Speed of learning beats being right on the first try.

Advanced Techniques (Optional)

Once you've mastered the basics, explore these advanced methods:

Sequential Testing

Test your riskiest assumption first. Use what you learn to design the next test.

Hypothesis Trees

Break big hypotheses into smaller ones. Test each branch separately to understand what really matters.

Competitive Testing

Don't just copy what competitors do. Test WHY their features work for their users.

From Hypothesis to Roadmap

High confidence (greater than 80% sure): Build it
Medium confidence (50-80%): Test it more
Low confidence (less than 50%): Research first

Sources

Market Logic Software analysis shows 30-49% failure rates for new products, with FMCG reaching 70-80% ↩
Joe Gebbia talks and early Airbnb history document the NYC photography experiment ↩
Amazon's 1-Click patent exemplifies hypothesis-driven innovation focused on friction reduction ↩