Statistical vs Practical Significance

Q: What is the minimum detectable effect (MDE)?

The smallest true effect your test can reliably detect, given your sample size, alpha, and power. Set the MDE before you run the test. Detecting half the effect requires four times the sample size, so the MDE is the lever that controls test cost.

Q: Is p < 0.05 enough to ship?

No, not on its own. p < 0.05 only tells you the effect is probably real. It says nothing about whether the effect is large enough to matter. Always pair statistical significance with a check against your minimum effect of interest.

Q: Can a result be practically significant but not statistically significant?

Yes, and it's often a sign your test is under-powered. A 10% lift you can see but can't statistically confirm means you didn't run long enough or didn't have enough traffic. Run again with more sample.

Q: What's the right MDE for my test?

The smallest lift that would change a real decision: ship/no-ship, build/don't build. For most growth tests, that's 3 to 5%. For pricing tests, 1 to 2%. For experimental redesigns, sometimes 10%+. The MDE should reflect the cost of acting, not the appetite of the team.

Q: Why does sample size shrink with a larger MDE?

Because larger effects are easier to detect. Detecting a 10% lift requires far less sample than detecting a 1% lift. Raising the MDE is the most direct way to make a test feasible at your traffic level. The tradeoff: smaller real effects become invisible.

The single biggest mistake in A/B testing is shipping because p < 0.05. Statistical significance asks if the effect is real. Practical significance asks if it is worth it.

Last updated: 2026-04-01

Overview

Statistical Significance

Is It Real?

A statistical claim that an observed difference is unlikely to have happened by chance. Reported via a p-value or confidence interval. Standard threshold: p < 0.05.

Best as a guard against acting on noise. A statistically significant result tells you the effect is probably real.

Practical Significance

Is It Worth It?

A business judgment about whether the size of the effect is worth shipping, given costs, risk, and opportunity cost. Often expressed as a minimum effect of interest or MDE.

Best as a sanity check. A practically significant result tells you the effect is worth acting on, not just measurable.

Formula comparison

Statistical Significance

z = (p1 - p2) / sqrt(p_pool x (1 - p_pool) x (1/n1 + 1/n2))

Compare z against the critical value for your alpha. Or compare the resulting p-value against alpha directly. Default alpha = 0.05.

Practical Significance

No formula. Set a minimum effect of interest (MEI) before the test starts.

The MEI drives the MDE, which drives the sample size. Detecting half the effect requires roughly four times the sample.

Side-by-side comparison

Criteria	Statistical Significance	Practical Significance
Question answered	Is the effect real?	Is the effect worth it?
Type	Statistical claim	Business judgment
Reported as	p-value, confidence interval	Minimum effect of interest, MDE
When small samples	Critical. Easy to mistake noise for signal	Less critical. The number rarely beats MEI anyway
When huge samples	Risky. Everything becomes "significant"	Critical. Tiny effects pass p < 0.05 without mattering
Set when	After the test, computed from data	Before the test, agreed by the team
The trap	Shipping anything with p < 0.05	Setting MEI so high the test always under-powers
Healthy practice	Always check both gates	Always check both gates

When to use each

Choose Statistical Significance when

Sample size is small. Small samples are noisy and easy to misread
The cost of a false positive is high (a launch that breaks something)
Stakeholders will scrutinize the result
You're under pressure to ship and need to defend the decision

Choose Practical Significance when

Sample size is huge. Tiny differences become "significant" but not meaningful
The change has costs beyond the test (engineering, support, risk)
You're prioritizing a roadmap of experiments. Small wins eat capacity
The effect size is below your minimum effect of interest

Pros and cons

Statistical Significance

Pros

Quantitative gate against noise
Standard. Stakeholders recognize p-values and confidence intervals
Pairs naturally with sample-size planning

Cons

Easy to misuse at very large samples (everything becomes significant)
Treated as binary when it's a continuous risk measure
Doesn't speak to whether the effect matters

Practical Significance

Pros

Forces the business question before the test
Filters out small wins that aren't worth ship cost
Makes the test plan honest about the MDE

Cons

"Worth it" depends on the team and the moment
Easy to forget. Most A/B testing tools don't surface it
Can be gamed by setting an MDE so large the test is doomed to under-power

Try both calculators

Score your own data with both frameworks. Compare results and pick the one that fits your team.

A/B Test Calculator A/B Test Post-Analysis

Frequently asked questions

What is the minimum detectable effect (MDE)?

The smallest true effect your test can reliably detect, given your sample size, alpha, and power. Set the MDE before you run the test. Detecting half the effect requires four times the sample size, so the MDE is the lever that controls test cost.

Is p < 0.05 enough to ship?

No, not on its own. p < 0.05 only tells you the effect is probably real. It says nothing about whether the effect is large enough to matter. Always pair statistical significance with a check against your minimum effect of interest.

Can a result be practically significant but not statistically significant?

Yes, and it's often a sign your test is under-powered. A 10% lift you can see but can't statistically confirm means you didn't run long enough or didn't have enough traffic. Run again with more sample.

What's the right MDE for my test?

The smallest lift that would change a real decision: ship/no-ship, build/don't build. For most growth tests, that's 3 to 5%. For pricing tests, 1 to 2%. For experimental redesigns, sometimes 10%+. The MDE should reflect the cost of acting, not the appetite of the team.

Why does sample size shrink with a larger MDE?

Because larger effects are easier to detect. Detecting a 10% lift requires far less sample than detecting a 1% lift. Raising the MDE is the most direct way to make a test feasible at your traffic level. The tradeoff: smaller real effects become invisible.

Back to calculators