Skip to main content
← Glossary · Statistics & Methodology

Mann-Whitney U Test

A non-parametric statistical test that compares two independent groups without assuming normal distribution, testing whether one group tends to have larger values than the other.

What Is the Mann-Whitney U Test?

The Mann-Whitney U test compares two groups by ranking every observation from smallest to largest, then checking whether one group's ranks tend to be higher than the other's. It ignores absolute values, which makes it robust to outliers and free from distribution assumptions.

Also Known As

  • Data science teams: Mann-Whitney-Wilcoxon, Wilcoxon rank-sum, non-parametric t
  • Growth teams: robust significance test
  • Marketing teams: the test you use when revenue data is weird
  • Engineering teams: rank test, U-test

How It Works

Imagine an A/B test with 10,000 visitors per variant measuring revenue-per-visitor. Most users contribute $0, a handful contribute hundreds. A t-test on this data is dragged by outliers and may be unreliable. Instead, you rank all 20,000 values from smallest to largest and sum the ranks in each group. If the ranks in Variant B are systematically higher, the U-statistic will be large, and the p-value will be small. This sidesteps the outlier problem because the top buyer just gets rank 20,000 regardless of whether they spent $500 or $50,000.

Best Practices

  • Do use Mann-Whitney for any skewed metric where medians matter more than means.
  • Do confirm your tooling handles ties correctly; ties can inflate false positive rates if mishandled.
  • Do pair the U-statistic with a median difference or Hodges-Lehmann estimator for business context.
  • Do not use Mann-Whitney when your hypothesis is specifically about the mean.
  • Do not assume Mann-Whitney tests equality of distributions; it tests stochastic dominance.

Common Mistakes

  • Reporting Mann-Whitney p-values without any measure of effect size or direction.
  • Forgetting that with very large samples, Mann-Whitney will flag trivially small ordering differences.
  • Confusing the U-statistic with a z-score; modern implementations convert it internally.

Industry Context

  • SaaS/B2B: Contract-value metrics benefit from rank-based tests because of whale accounts.
  • Ecommerce/DTC: Revenue-per-visitor is the canonical use case, especially for categories with wide price ranges.
  • Lead gen/services: Lead-value distributions are almost always skewed enough to warrant Mann-Whitney.

The Behavioral Science Connection

Rank-based reasoning reflects how humans often process comparisons in the real world: we know a restaurant is better than another without knowing exact ratings. Thaler's work on relative valuation shows people are more comfortable ranking than quantifying. Mann-Whitney respects that intuition by only asking "is A usually greater than B?"

Key Takeaway

Mann-Whitney is the outlier-robust cousin of the t-test and should be your default for skewed revenue metrics.