A/B Testing
How A/B testing works in practice, common mistakes that invalidate results, and the framework for designing experiments that prove causality.
A/B testing is a controlled experiment where you split traffic between two or more variants to measure which performs better against a defined metric. Variant A is typically the control (current experience), and variant B is the treatment (your hypothesis-driven change).
Why it matters
A/B testing is the only reliable way to prove that a change caused an improvement. Before/after comparisons, time-series analysis, and gut feel all suffer from confounding variables — seasonality, marketing campaigns, competitor actions. A properly run A/B test isolates your change as the only variable.
The anatomy of a good test
Every A/B test needs four elements before launch:
-
A hypothesis grounded in evidence. “Users abandon checkout because the form feels too long” is testable. “Let’s try a blue button” is not a hypothesis — it’s a guess.
-
A primary metric tied to revenue. Click-through rate is a proxy. Revenue per visitor is the real metric. Define it upfront and don’t change it after launch.
-
A pre-calculated sample size. This determines how long the test runs. Calculate it based on your baseline conversion rate, minimum detectable effect, and desired statistical power.
-
Guardrail metrics. You might increase conversion rate while tanking average order value. Guardrails catch unintended consequences.
Common mistakes
Stopping early. You see a winner after 3 days and ship it. But you needed 14 days of data for valid results. Early wins often regress to the mean.
Testing too many things. If you change the headline, layout, CTA text, and imagery all at once, you’ll know something worked — but not what. Test one hypothesis at a time, or use multivariate testing with appropriate sample sizes.
Ignoring segments. An overall flat result can hide a massive win in one segment and a loss in another. Always check key segments — device type, traffic source, new vs. returning.
Practical example
You hypothesize that adding social proof (customer count) near the CTA will reduce purchase anxiety. You run an A/B test: control has the existing CTA, treatment adds “Join 12,000+ customers” above the button. You pre-calculate needing 20,000 visitors per variant at 95% confidence. After reaching full sample size, the treatment shows a 7.3% lift in revenue per visitor with p = 0.02. That’s a valid, actionable result.
Keep Reading
Put This Into Practice
Understanding the theory is step one. Building an experimentation program that applies these concepts systematically — and ties every test to revenue — is where the real impact happens.
Revenue Frameworks
for Growth Leaders
Every week: one experiment, one framework, one insight to make your marketing more evidence-based and your revenue more predictable.