Heterogeneous Treatment Effects (HTE)
The observation that treatment effects vary across subgroups — the average effect hides winners and losers underneath.
What Is Heterogeneous Treatment Effects (HTE)?
Average treatment effect (ATE) is the headline number from an A/B test. HTE is the reality underneath: the treatment might lift power users by 15% while hurting trialists by 5%, netting out to a +4% ATE that masks both the real win and the real risk. Ignoring HTE means shipping changes that help the majority while silently degrading critical segments — or killing changes that would have been transformative for a subset.
Also Known As
- Data science: HTE, treatment effect heterogeneity, subgroup effects
- Growth: "it worked for some users, not others"
- Marketing: segment-level impact
- Engineering: conditional effects, interaction effects
How It Works
You test a new pricing page. ATE: +3% conversion, p = 0.02, statistically significant win. You then estimate CATE by cohort: mobile lifts +8%, desktop is flat, tablet shows -4%. The win is real but concentrated in mobile. Ship as-is and you get most of the upside but leave a -4% tablet regression in place. Smarter move: ship to mobile only, investigate tablet, run a targeted test there.
HTE is estimated with pre-registered subgroups, causal forests, meta-learners (T, S, X, R learners), or Bayesian hierarchical models. Critical constraint: subgroups must be defined before the test, or inference is exploratory.
Best Practices
- Pre-register 3–5 subgroups in the test doc before launch.
- Correct for multiple comparisons when reporting HTE across many segments.
- Use causal forests or meta-learners for high-dimensional HTE rather than manual subgroup slicing.
- Require replication for any HTE finding before acting on it in a personalization rule.
- Visualize with a forest plot of segment effects to expose heterogeneity at a glance.
Common Mistakes
- Post-hoc subgroup fishing. If you slice 20 ways and report the one that's significant, you've found noise.
- Acting on HTE from underpowered segments. A "win" in a segment of 2,000 users has huge variance.
- Ignoring HTE entirely and treating the ATE as if it described every user equally.
Industry Context
In SaaS/B2B, HTE along plan tier, company size, and lifecycle stage is usually enormous — a feature change means different things to a trial user versus a year-3 enterprise customer. In ecommerce, HTE along device, new/returning, and AOV tier is almost always present. In lead gen, HTE along traffic source and offer affinity is the reason channel-agnostic tests often mislead.
The Behavioral Science Connection
The ATE is a convenient fiction: it tells a single simple story about a system that is never actually uniform. HTE analysis resists representativeness heuristic — the temptation to treat the average as typical. In reality, no user is the average user, and treatments affect identifiable subgroups differently in ways that matter for product decisions.
Key Takeaway
The ATE is the first number you look at and the last number you should trust. HTE is where real product understanding lives — and where the biggest wins and hidden regressions hide.