Selection Bias
A systematic error that occurs when the sample studied is not representative of the population intended to be analyzed, often due to non-random assignment, self-selection, or flawed data collection.
What Is Selection Bias?
Selection bias occurs whenever the people in your sample differ systematically from the people you actually want to learn about. It corrupts conclusions before any statistics are computed. In experimentation, it shows up through sample ratio mismatch, opt-in filters, cookie loss, and tracking gaps that exclude specific segments.
Also Known As
- Data science teams: sample selection bias, SRM (sample ratio mismatch), non-response bias
- Growth teams: tracking bias, attribution gaps
- Marketing teams: audience skew, sample skew
- Engineering teams: SRM, instrumentation bias
How It Works
Imagine an A/B test with 10,000 visitors per variant where Variant B uses a JavaScript feature that fails on Safari. Of 1,500 Safari visitors routed to Variant B, 300 silently bounce without being counted. Variant B's recorded sample is now 9,700 with the worst-performing traffic segment missing. Its reported conversion rate looks artificially high. The lift is an illusion — a pure artifact of selection bias — but the p-value will be wonderfully significant.
Best Practices
- Do check sample ratio mismatch (SRM) on every test; flag anything with p < 0.001.
- Do use intent-to-treat analysis that counts all assigned users.
- Do monitor cross-variant demographics for drift during the test.
- Do not analyze only users who completed an action; this guarantees survivorship bias.
- Do not trust self-selected samples for causal inference.
Common Mistakes
- Ignoring SRM because "the lift is so big it must be real."
- Comparing users who opted into a feature against those who did not and calling the difference causal.
- Excluding "outliers" based on criteria correlated with the treatment.
Industry Context
- SaaS/B2B: Tracking gaps on enterprise accounts create silent selection bias.
- Ecommerce/DTC: Ad-blocker users are systematically missing, skewing observed conversion.
- Lead gen/services: Form fills self-select for intent, complicating causal interpretation.
The Behavioral Science Connection
Selection bias is the statistical expression of "what you see is all there is" (WYSIATI), Kahneman's core cognitive shortcut. We reason from the visible sample as if it were the full population. Formal SRM checks are the disciplined antidote.
Key Takeaway
Most A/B test disasters are selection bias in disguise; run SRM and intent-to-treat on every test.