Clustering Effects
Statistical dependence between observations that share a common group membership, which inflates false positive rates when ignored in experiment analysis.
What Are Clustering Effects?
Clustering effects occur when observations within groups are more similar to each other than to observations in other groups. In experimentation this appears whenever the randomization unit differs from the analysis unit — for example, randomizing at the user level but counting pageviews in the denominator. The observations within each cluster are not independent, which means your effective sample size is smaller than your nominal sample size, and naive statistical tests overstate your confidence.
Also Known As
- Marketing teams rarely use the term — they notice it as "suspiciously narrow" CIs.
- Growth teams say clustering, intra-cluster correlation, or ICC.
- Product teams use clustering effects or dependent observations.
- Engineering teams refer to clustering, non-independent observations, or design effect.
- Statisticians strictly use ICC (intraclass correlation coefficient) and design effect.
How It Works
You randomize at the user level but measure pageview conversion rate. User 12345 has 50 pageviews in the test. All 50 share this user's preferences, context, and behavior — they're not 50 independent observations. The intra-cluster correlation (ICC) might be 0.3, meaning the design effect is 1 + (average cluster size - 1) × ICC = 1 + 49 × 0.3 = 15.7. Your effective sample size is 1/15.7 of your nominal sample size. Your confidence intervals should be ~4x wider than naive calculations suggest. Ignoring this produces false positives at rates far above your stated alpha.
Best Practices
- Match analysis unit to randomization unit whenever possible.
- If they must differ, apply cluster-robust standard errors or mixed-effects models.
- Calculate design effects explicitly when sample sizes are determined by clusters.
- For B2B products with heavy account-level clustering, consider account-level randomization.
- Inflate sample size estimates by 1.5–2x when clustering is moderate and unavoidable.
Common Mistakes
- Using naive standard errors when randomizing at user but analyzing pageviews or events.
- Ignoring account-level clustering in B2B products where a few accounts dominate.
- Treating all clustering as equivalent when intra-cluster correlation varies widely by metric.
Industry Context
- SaaS/B2B: Severe clustering — a few enterprise accounts dominate event volume, making account-level randomization often necessary.
- Ecommerce/DTC: Mild clustering from repeat buyers; usually manageable with user-level randomization.
- Lead gen: Minimal clustering — most users visit once before converting.
The Behavioral Science Connection
Clustering reflects that people within groups behave similarly. Employees share workflows, families share purchase patterns, accounts share configurations. Ignoring clustering is, in behavioral terms, pretending social context doesn't shape individual behavior — which we know is false.
Key Takeaway
If your randomization unit differs from your analysis unit, you have clustering — and your standard errors are wrong unless you account for it explicitly.