What is statistical significance in A/B testing?

Statistical significance tells you how surprising your result would be if the variants were actually identical. At a 95% significance level, if there were truly no difference between variants, you would see a result this extreme less than 5% of the time. This helps ensure your business decisions are based on real effects, not statistical noise. PM Toolkit's sample size calculator helps you plan tests with the right significance levels.

How do I determine the minimum detectable effect (MDE)?

MDE is the smallest change you want to reliably detect. Consider: 1) Business impact - what improvement would justify implementation costs? 2) Historical data - what changes have you seen from similar tests? 3) Practical significance - a 0.1% improvement might be statistically detectable but not worth pursuing. Typically, aim for 5-20% relative improvement for meaningful business impact. PM Toolkit's sample size calculator helps you find the optimal MDE for your traffic.

What's the difference between statistical power and significance level?

Significance level (alpha) is the probability of finding a difference when none exists (Type I error). Power (1-beta) is the probability of detecting a real difference when it exists. While significance prevents false positives, power prevents false negatives. Use 95% significance for confidence and 80% power as a minimum standard.

How long should I run my A/B test?

Run tests for at least one full business cycle (usually 1-2 weeks) to account for day-of-week effects. Even if you reach statistical significance early, avoid stopping to prevent false positives from peeking. Consider: weekly patterns, seasonal effects, and ensuring you capture different user cohorts. Our calculator provides duration estimates based on your traffic.

What if I don't have enough traffic for the ideal sample size?

If traffic is limited: 1) Increase your MDE threshold - focus on detecting larger improvements, 2) Extend test duration - but watch for seasonal effects, 3) Reduce confidence level to 90% for exploratory tests, 4) Focus on high-traffic pages or segments, 5) Consider sequential testing methods, 6) Use qualitative research to complement smaller samples.

What's the relationship between conversion rate and required sample size?

Lower baseline conversion rates require larger sample sizes to detect the same relative change. For example, improving from 2% to 2.2% (10% relative lift) needs more samples than improving from 20% to 22% (same 10% lift), because rare events have higher variance. At a 2-5% baseline, detecting a realistic 10% relative lift typically needs roughly 20,000 to 50,000 users per variant. Only very large lifts (40% or more) are detectable with a few thousand users per variant. PM Toolkit's sample size calculator visualizes these trade-offs so you can plan tests on metrics with reasonable baseline rates.

Start with

Sample Size Calculator

How many responses you need for statistically valid results — pair with A/B testing and research planning.

Updated May 2026

Ready

Total population

Monthly active users, beta testers, customers. Use 1,000,000+ for "infinite" populations.

Confidence level

Margin of error

Outreach planning (optional)

Expected response rate

B2B 10–15%, B2C 5–10%, internal 30–40%.

Cost per response

Required sample

—

Enter population, confidence, and margin, then Calculate.

Why this matters

Under-sampling gives you false confidence — results feel real but aren't statistically valid. Over-sampling wastes time and money. Use 95% confidence + ±5% margin for most decisions; tighten only when the stakes warrant it.

n = (Z² × p × (1−p)) ÷ E²

p = 0.5 is conservative (maximum variance). Finite population correction auto-applies when sample >5% of population.

Understanding Sample Size Determination for Research and Surveys

Sample size determination is a critical component of statistical research that ensures your survey results are representative, reliable, and actionable. Whether you're conducting user research, market studies, or product validation, calculating the right sample size helps you balance statistical accuracy with practical constraints like time and budget.

How to Calculate Sample Size: The Statistical Formula

The standard sample size formula is: n = (Z²p(1-p)) / E²

n = Required sample size
Z = Z-score (confidence level: 1.64 for 90%, 1.96 for 95%, 2.58 for 99%)
p = Population proportion (use 0.5 for maximum variability)
E = Margin of error (as decimal: 5% = 0.05)

For finite populations under 100,000, apply the finite population correction:Adjusted n = n / (1 + ((n-1) / N)) where N is the population size.

Confidence Level vs Margin of Error: What's the Difference?

Confidence Level

Confidence level represents how certain you want to be that your results are accurate. Think of it as your "trust score":

90% Confidence: Quick pulse checks, exploratory research
95% Confidence: Standard for most business decisions (recommended)
99% Confidence: High-stakes decisions, regulatory requirements

Margin of Error

Margin of error is the "wiggle room" in your results. If 60% of users like a feature with ±5% margin, the true percentage is between 55-65%. Typical margins of error:

±3%: Pricing decisions, critical product changes
±5%: Feature development, user experience research
±10%: Exploratory research, early-stage validation

Industry Response Rate Benchmarks

Response rates vary significantly by industry and survey method. Use these benchmarks to estimate your total outreach needs:

B2B Email Surveys: 10-15% response rate
B2C Consumer Surveys: 5-10% response rate
Internal Employee Surveys: 30-40% response rate
Phone Surveys: 15-20% response rate
In-app Surveys: 20-30% response rate
Panel/Paid Surveys: 50-70% response rate

When to Use This Sample Size Calculator

Product Management Research

Use for feature validation, user journey analysis, pricing research, and competitive analysis. Product managers typically need 95% confidence with ±5% margin for feature decisions.

UX Research and Usability Studies

For quantitative UX research, usability metrics, and conversion optimization. Note: Qualitative research (user interviews) typically requires much smaller samples (5-12 users per segment).

Market Research and Validation

Market sizing, demand validation, customer segmentation, and brand research. Consider using different confidence levels for different research stages.

Common Sample Size Mistakes and How to Avoid Them

Mistake 1: Ignoring Population Size

For populations under 100,000, use finite population correction to avoid oversized samples. For populations over 1 million, treat as infinite.

Mistake 2: Underestimating Non-Response

Always account for response rates. If you need 400 responses with 20% response rate, you must contact 2,000 people.

Mistake 3: One-Size-Fits-All Confidence

Use higher confidence (99%) for irreversible decisions and lower confidence (90%) for iterative testing.

Mistake 4: Subgroup Analysis Oversight

If analyzing subgroups, each subgroup needs adequate sample size. A 400-person sample split into 4 segments gives only 100 per segment.

Cost-Effective Sample Size Strategies

Balance statistical rigor with budget constraints by considering staged research, mixed-method approaches, and leveraging existing customer panels. Remember that actionable insights from a smaller, well-designed study often outperform perfect statistics from poorly executed large studies.

What is A/B Test Sample Size?

Sample size is the number of users each variant of an A/B test needs before the result can be trusted. It depends on your baseline conversion rate, the minimum detectable effect, statistical power, and significance level. Underpowered tests produce false negatives and winners that vanish on relaunch.

Sample Size Formula

n = (Z_alpha + Z_beta)^2 × 2 × p(1 - p) ÷ MDE^2

Typical Requirement

Roughly 20,000-50,000 users per variant to detect a 10% lift at a 2-5% baseline

Common questions

What is Sample Size in A/B Testing?