Chi-Squared Test
A statistical test that evaluates whether observed frequencies in categorical data differ significantly from expected frequencies, commonly used to compare conversion rates across A/B test variants.
What Is the Chi-Squared Test?
The chi-squared test compares how often things happen (observed frequencies) against how often you would expect them to happen under a null hypothesis (expected frequencies). In CRO, it is the classic way to ask "do these variants have the same conversion rate?" for categorical outcomes.
Also Known As
- Data science teams: chi-square, Pearson's chi-squared, contingency table test
- Growth teams: conversion significance test
- Marketing teams: the test behind "significant or not" badges
- Engineering teams: X^2, goodness-of-fit test
How It Works
Imagine a test with 10,000 visitors per variant. Variant A gets 300 conversions, Variant B gets 360. Under the null hypothesis of no difference, the pooled rate is 3.30%, so you would expect 330 conversions in each arm. The chi-squared statistic sums ((observed - expected)^2 / expected) across all cells, yielding roughly 5.45 in this example, which corresponds to a p-value around 0.02 — significant at alpha = 0.05. Statistical significance here does not answer whether 0.60% lift is worth shipping.
Best Practices
- Do require at least 5 expected counts in every cell; use Fisher's exact test below that.
- Do use chi-squared for multi-variant tests where you want a single overall significance number.
- Do pair the chi-squared statistic with effect size (like Cramer's V) to gauge practical meaning.
- Do not use chi-squared on non-independent samples (paired designs, repeat sessions).
- Do not apply chi-squared to continuous metrics like revenue per visitor — use t-tests or Mann-Whitney instead.
Common Mistakes
- Running chi-squared on small cells where expected counts are below 5.
- Ignoring that with enough traffic, chi-squared will call any tiny difference significant.
- Forgetting the test is two-sided by default; direction must be read from the data.
Industry Context
- SaaS/B2B: Low trial conversion rates produce small cells; Fisher's exact is often safer.
- Ecommerce/DTC: High-volume checkout tests are the perfect use case for chi-squared.
- Lead gen/services: Sparse form completions often force long runtimes or Bayesian alternatives.
The Behavioral Science Connection
The chi-squared test encodes a key behavioral idea: surprise relative to expectation. Humans intuitively use this logic when we say "I would have expected more from that variant." Kahneman's work on "associative coherence" shows we reason by comparing observations to mental expectations, which is exactly what chi-squared formalizes.
Key Takeaway
Chi-squared is the right test for categorical A/B outcomes, but its p-value alone is never enough to decide.