What is A/B Testing?
A controlled experiment that exposes two user groups to different product variants to determine which version produces a statistically significant improvement in a target metric.
A/B Testing (also called split testing) is a controlled experiment in which users are randomly assigned to two or more variants of a product experience to determine which version produces a statistically significant improvement in a target metric. The control group (A) sees the existing experience while the treatment group (B) sees the proposed change. A/B testing removes guesswork from product decisions by measuring real user behaviour rather than relying on opinion.
Formula
Statistical Significance via p-value < 0.05 (95% confidence)p-value < 0.05 means there is less than a 5% probability the observed difference occurred by chance. Lift = (Treatment Rate - Control Rate) / Control Rate x 100%. Example: Control conversion = 4%, Treatment conversion = 4.8%. Lift = (4.8 - 4) / 4 x 100% = 20% relative lift. Requires pre-calculated sample size per variant to be reached before declaring a winner.
Industry Benchmarks
- 95% confidence (p < 0.05) is the standard threshold for declaring significance
- 80% statistical power is the standard for experiment design
- Typical A/B test duration: 2-4 weeks to reduce day-of-week effects
- Run tests for at least one full business cycle even after significance is reached
- Companies running 10+ experiments per month report 2-5x faster product improvement
When to Use A/B Testing
- Testing a new landing page headline or CTA before committing to a full redesign
- Validating that a new onboarding flow improves trial-to-paid conversion before full rollout
- Measuring the impact of a pricing change on signup rate and revenue per visitor
- Comparing two algorithm variants (recommendation, ranking, feed) on engagement metrics
- Stopping tests early after seeing a significant result, which inflates false positive rates through the peeking problem
- Running A/B tests without pre-calculating required sample size, leading to underpowered experiments with inconclusive results
- Testing too many variables simultaneously in a single experiment, making it impossible to attribute the effect to a specific change
- Pre-register your hypothesis and primary metric before launching the test to prevent p-hacking and post-hoc rationalisation of results
- Segment results by user cohort after the test concludes - winning treatments overall sometimes lose for high-value user segments
- Establish a minimum detectable effect that is actually business-meaningful (e.g. 10% relative lift) rather than whatever sample size makes the test run quickly
Related Terms
Free A/B Testing Calculator
Skip the spreadsheet. Enter your numbers in the free A/B Test Calculator and get a benchmarked A/B Testing result in seconds.