A/B Testing
A randomized experiment comparing two or more versions of a page or feature to determine which performs better on a predefined metric.
What Is A/B Testing?
A/B testing (also called split testing) is a randomized, controlled experiment where visitors are randomly assigned to a control (A) or one or more variations (B, C, etc.). By randomly splitting traffic and measuring outcomes on a predefined metric, you can isolate the causal impact of a specific change rather than guessing from correlation in your analytics. It is the gold standard for evidence-based product, marketing, and UX decisions.
Also Known As
- Marketing teams call it split testing, landing page testing, or conversion testing.
- Sales teams often refer to it as a pilot or champion/challenger test.
- Growth teams use "experiment," "growth test," or simply "test."
- Product teams say "feature experiment," "product experiment," or "controlled rollout."
- Engineering teams talk about "randomized rollouts," "controlled experiments," or "online experiments."
How It Works
Suppose your landing page converts at 4.0% and you suspect a new hero headline will do better. You define your primary metric (signup rate), calculate a required sample size (say 25,000 visitors per variant to detect a 10% relative lift with 80% power at alpha=0.05), and split traffic 50/50. After 14 days you have 50,000 visitors total: control converted at 4.0% (1,000 signups) and variation converted at 4.5% (1,125 signups). A chi-squared test returns p=0.02 — statistically significant. You ship the variation with confidence the 0.5 percentage point lift is causal, not noise.
Best Practices
- Pre-register a single primary metric, sample size, and stopping rule before launch.
- Only run one test per surface area at a time unless you have orthogonal randomization.
- Run every test for at least one full business cycle (usually a week) to smooth day-of-week effects.
- QA both variants on multiple devices and browsers before traffic starts.
- Document the hypothesis, result, and learning even when the test loses — losses are 50% of your library.
Common Mistakes
- Peeking and stopping early the moment results look significant — this inflates false positive rates to 20–30%.
- Testing too many things at once in a single variant, making it impossible to attribute the result to any specific change.
- Ignoring practical significance — a 0.1% lift that reaches statistical significance on massive traffic isn't worth the engineering cost.
Industry Context
- SaaS/B2B: Tests typically focus on trial signup, activation, and upgrade funnels. Low traffic on high-intent pages means tests often run 4–8 weeks.
- Ecommerce/DTC: High traffic allows rapid iteration on product pages, PDPs, and checkout. Revenue per visitor is often the true primary metric, not conversion rate alone.
- Lead gen: Form length, multi-step flows, and offer framing dominate the backlog. Lead quality guardrails matter as much as lead volume.
The Behavioral Science Connection
A/B testing is the empirical antidote to the narrative fallacy — our tendency to confuse compelling stories about user behavior with what users actually do. Intuition is reliably wrong in conversion optimization because we underestimate how much framing, anchoring, and defaults shape decisions. A/B testing forces us to let the data, not the story, decide.
Key Takeaway
A/B testing turns opinions into evidence by isolating cause and effect — but only when you pre-register the hypothesis and resist the urge to peek.