What is Selection Bias?

Atticus Li

← Glossary · Statistics & Methodology

Selection Bias

A systematic error that occurs when the sample studied is not representative of the population intended to be analyzed, often due to non-random assignment, self-selection, or flawed data collection.

Selection bias occurs whenever the people in your sample differ systematically from the people you actually want to learn about. It corrupts conclusions before any statistics are computed. In experimentation, it shows up through sample ratio mismatch, opt-in filters, cookie loss, and tracking gaps that exclude specific segments.

Also Known As

Data science teams: sample selection bias, SRM (sample ratio mismatch), non-response bias
Growth teams: tracking bias, attribution gaps
Marketing teams: audience skew, sample skew
Engineering teams: SRM, instrumentation bias

How It Works

Imagine an A/B test with 10,000 visitors per variant where Variant B uses a JavaScript feature that fails on Safari. Of 1,500 Safari visitors routed to Variant B, 300 silently bounce without being counted. Variant B's recorded sample is now 9,700 with the worst-performing traffic segment missing. Its reported conversion rate looks artificially high. The lift is an illusion — a pure artifact of selection bias — but the p-value will be wonderfully significant.

Best Practices

Do check sample ratio mismatch (SRM) on every test; flag anything with p < 0.001.
Do use intent-to-treat analysis that counts all assigned users.
Do monitor cross-variant demographics for drift during the test.
Do not analyze only users who completed an action; this guarantees survivorship bias.
Do not trust self-selected samples for causal inference.

Common Mistakes

Ignoring SRM because "the lift is so big it must be real."
Comparing users who opted into a feature against those who did not and calling the difference causal.
Excluding "outliers" based on criteria correlated with the treatment.

Industry Context

SaaS/B2B: Tracking gaps on enterprise accounts create silent selection bias.
Ecommerce/DTC: Ad-blocker users are systematically missing, skewing observed conversion.
Lead gen/services: Form fills self-select for intent, complicating causal interpretation.

The Behavioral Science Connection

Selection bias is the statistical expression of "what you see is all there is" (WYSIATI), Kahneman's core cognitive shortcut. We reason from the visible sample as if it were the full population. Formal SRM checks are the disciplined antidote.

Key Takeaway

Most A/B test disasters are selection bias in disguise; run SRM and intent-to-treat on every test.

← Browse All Terms