What is Heterogeneous Treatment Effects (HTE)?

Atticus Li

← Glossary · Statistics & Methodology

Heterogeneous Treatment Effects (HTE)

The observation that treatment effects vary across subgroups — the average effect hides winners and losers underneath.

Average treatment effect (ATE) is the headline number from an A/B test. HTE is the reality underneath: the treatment might lift power users by 15% while hurting trialists by 5%, netting out to a +4% ATE that masks both the real win and the real risk. Ignoring HTE means shipping changes that help the majority while silently degrading critical segments — or killing changes that would have been transformative for a subset.

Also Known As

Data science: HTE, treatment effect heterogeneity, subgroup effects
Growth: "it worked for some users, not others"
Marketing: segment-level impact
Engineering: conditional effects, interaction effects

How It Works

You test a new pricing page. ATE: +3% conversion, p = 0.02, statistically significant win. You then estimate CATE by cohort: mobile lifts +8%, desktop is flat, tablet shows -4%. The win is real but concentrated in mobile. Ship as-is and you get most of the upside but leave a -4% tablet regression in place. Smarter move: ship to mobile only, investigate tablet, run a targeted test there.

HTE is estimated with pre-registered subgroups, causal forests, meta-learners (T, S, X, R learners), or Bayesian hierarchical models. Critical constraint: subgroups must be defined before the test, or inference is exploratory.

Best Practices

Pre-register 3–5 subgroups in the test doc before launch.
Correct for multiple comparisons when reporting HTE across many segments.
Use causal forests or meta-learners for high-dimensional HTE rather than manual subgroup slicing.
Require replication for any HTE finding before acting on it in a personalization rule.
Visualize with a forest plot of segment effects to expose heterogeneity at a glance.

Common Mistakes

Post-hoc subgroup fishing. If you slice 20 ways and report the one that's significant, you've found noise.
Acting on HTE from underpowered segments. A "win" in a segment of 2,000 users has huge variance.
Ignoring HTE entirely and treating the ATE as if it described every user equally.

Industry Context

In SaaS/B2B, HTE along plan tier, company size, and lifecycle stage is usually enormous — a feature change means different things to a trial user versus a year-3 enterprise customer. In ecommerce, HTE along device, new/returning, and AOV tier is almost always present. In lead gen, HTE along traffic source and offer affinity is the reason channel-agnostic tests often mislead.

The Behavioral Science Connection

The ATE is a convenient fiction: it tells a single simple story about a system that is never actually uniform. HTE analysis resists representativeness heuristic — the temptation to treat the average as typical. In reality, no user is the average user, and treatments affect identifiable subgroups differently in ways that matter for product decisions.

Key Takeaway

The ATE is the first number you look at and the last number you should trust. HTE is where real product understanding lives — and where the biggest wins and hidden regressions hide.

← Browse All Terms