What is Chi-Squared Test? — Glossary

Atticus Li

← Glossary · Statistics & Methodology

Chi-Squared Test

A statistical test that evaluates whether observed frequencies in categorical data differ significantly from expected frequencies, commonly used to compare conversion rates across A/B test variants.

What Is the Chi-Squared Test?

The chi-squared test compares how often things happen (observed frequencies) against how often you would expect them to happen under a null hypothesis (expected frequencies). In CRO, it is the classic way to ask "do these variants have the same conversion rate?" for categorical outcomes.

Also Known As

Data science teams: chi-square, Pearson's chi-squared, contingency table test
Growth teams: conversion significance test
Marketing teams: the test behind "significant or not" badges
Engineering teams: X^2, goodness-of-fit test

How It Works

Imagine a test with 10,000 visitors per variant. Variant A gets 300 conversions, Variant B gets 360. Under the null hypothesis of no difference, the pooled rate is 3.30%, so you would expect 330 conversions in each arm. The chi-squared statistic sums ((observed - expected)^2 / expected) across all cells, yielding roughly 5.45 in this example, which corresponds to a p-value around 0.02 — significant at alpha = 0.05. Statistical significance here does not answer whether 0.60% lift is worth shipping.

Best Practices

Do require at least 5 expected counts in every cell; use Fisher's exact test below that.
Do use chi-squared for multi-variant tests where you want a single overall significance number.
Do pair the chi-squared statistic with effect size (like Cramer's V) to gauge practical meaning.
Do not use chi-squared on non-independent samples (paired designs, repeat sessions).
Do not apply chi-squared to continuous metrics like revenue per visitor — use t-tests or Mann-Whitney instead.

Common Mistakes

Running chi-squared on small cells where expected counts are below 5.
Ignoring that with enough traffic, chi-squared will call any tiny difference significant.
Forgetting the test is two-sided by default; direction must be read from the data.

Industry Context

SaaS/B2B: Low trial conversion rates produce small cells; Fisher's exact is often safer.
Ecommerce/DTC: High-volume checkout tests are the perfect use case for chi-squared.
Lead gen/services: Sparse form completions often force long runtimes or Bayesian alternatives.

The Behavioral Science Connection

The chi-squared test encodes a key behavioral idea: surprise relative to expectation. Humans intuitively use this logic when we say "I would have expected more from that variant." Kahneman's work on "associative coherence" shows we reason by comparing observations to mental expectations, which is exactly what chi-squared formalizes.

Key Takeaway

Chi-squared is the right test for categorical A/B outcomes, but its p-value alone is never enough to decide.

← Browse All Terms