What is Segmentation in A/B Tests?

Atticus Li

Segmentation in A/B Tests

The practice of analyzing A/B test results across different user subgroups to understand whether treatment effects vary by audience characteristics.

Segmentation in A/B testing is the practice of analyzing experiment results across user subgroups — device type, tenure, traffic source, geography, behavioral cohort — to understand whether the treatment effect is heterogeneous. A flat overall result might hide a big lift for new users and a big loss for returning ones; a big overall lift might be driven entirely by one segment while others lose. Segmentation reveals these patterns, and it's also the single biggest source of false discovery in experimentation.

Also Known As

Marketing teams call it segment analysis, audience analysis, or cut by.
Growth teams say segmentation, sub-group analysis, or cohort analysis.
Product teams use segment analysis or heterogeneous treatment effects.
Engineering teams refer to slicing, filtering, or segment breakdown.
Statisticians call it subgroup analysis or heterogeneous treatment effect (HTE) estimation.

How It Works

Test concludes: overall lift is +1% (not significant). You slice by device: mobile shows +6% (p=0.02, 30K visitors), desktop shows -3% (p=0.04, 30K visitors). The variant genuinely helps mobile and hurts desktop — they canceled out at the aggregate level. Now you know: ship to mobile only, or iterate on the variant until desktop also wins. But if you sliced by 20 different dimensions instead of pre-registering device as a primary segment, at least one would show "significance" by chance alone — and you'd have to treat it skeptically.

Best Practices

Pre-register 2–3 segments based on your hypothesis before launching the test.
Power your primary segments up front — don't slice into segments with 500 users each.
Apply Bonferroni corrections for the number of pre-registered segments.
Treat post-hoc segment findings as hypotheses for confirmation tests, not ship decisions.
Watch for Simpson's Paradox — sometimes a lift reverses direction when you aggregate.

Common Mistakes

Slicing into 15 segments post-hoc and shipping the one "winning" segment without confirmation.
Underpowering segments — each segment needs its own adequate sample.
Ignoring Simpson's Paradox when sub-group trends reverse the aggregate.

Industry Context

SaaS/B2B: Tenure-based segments (new vs. existing customers) often reveal the most heterogeneity.
Ecommerce/DTC: Device and traffic source segments drive the biggest heterogeneous treatment effects.
Lead gen: Traffic source and intent (branded vs. non-branded search) matter most.

The Behavioral Science Connection

Segmentation reflects a core behavioral truth: the same nudge works differently in different contexts. Loss aversion is stronger for high-stakes users, social proof is stronger for new users, defaults are stronger for low-involvement decisions. Segmentation is how you quantify these context effects.

Key Takeaway

Pre-register your key segments, power them adequately, and treat post-hoc segment findings as hypotheses — not ship decisions.

← Browse All Terms