The Independence Assumption Most Teams Ignore
Standard A/B testing relies on a critical assumption: the outcome for one user is independent of the treatment assignment of other users. When you show User A a new headline, it does not affect User B's experience.
This assumption holds for many product changes. But it collapses entirely in products with network effects — marketplaces, social platforms, communication tools, and any system where users interact with each other.
When users in the treatment group can influence users in the control group, your experiment is contaminated. The estimated treatment effect is biased, usually toward zero, which means you underestimate the true impact of your change. This is not a minor statistical nuance. It is a fundamental threat to the validity of your experiment.
How Interference Actually Works
Interference occurs through several mechanisms, and understanding the specific mechanism in your product determines the correct experimental design.
Direct interaction interference
In a messaging app, if you change the compose experience for treatment users, control users receive differently-composed messages. The treatment has leaked into the control group through direct user interaction. Both groups are now partially treated, and the difference between them understates the true effect.
Supply-side interference
In a marketplace, if your treatment changes which items are shown to buyers, it affects inventory availability for all other buyers — including those in the control group. A recommendation algorithm that shows certain items more prominently to treatment users makes those items less available to control users. The control experience has changed even though you did not intend to change it.
Demand-side interference
Similarly, if your treatment encourages more purchasing behavior among treatment users, it can deplete supply that control users would have accessed. The control group's experience degrades because of treatment group behavior.
Social influence interference
In platforms with social features, users in the treatment group who adopt a new behavior may influence their connections in the control group to adopt similar behaviors — not because of the treatment, but through social contagion. This again biases your estimate toward zero because the control group is partially "treated" through social influence.
Equilibrium effects
Some changes shift market equilibrium. A pricing algorithm change tested on a subset of users may alter the supply-demand balance for everyone. The test measures the partial equilibrium effect, but the full rollout would produce a different total equilibrium effect. The test result does not predict the production impact.
Why Standard Metrics Lie Under Interference
When interference is present, the standard treatment effect estimate (difference in means between treatment and control) is biased. The direction and magnitude of bias depend on the interference mechanism.
Positive spillover occurs when treatment benefits leak to control users. Your new recommendation algorithm helps treatment users find better products, but it also improves the products that control users stumble onto because the same improved ranking affects marketplace-wide visibility. The measured treatment effect is smaller than the true effect because the control group also benefited.
Negative spillover occurs when treatment benefits come at the expense of control users. Your new matching algorithm connects treatment users with better providers, but those providers are now less available for control users. The measured treatment effect might look larger than the total effect because you are measuring a zero-sum reallocation, not a net creation of value.
In either case, the naive treatment effect estimate is wrong. Decisions made on these biased estimates lead to suboptimal product decisions.
Experimental Designs That Handle Interference
Several approaches address interference, each with different tradeoffs between statistical power, practical feasibility, and the accuracy of the causal estimate.
Cluster randomization
Instead of randomizing individual users, randomize groups of users who interact with each other. If you can identify clusters where most interaction happens within the cluster and little happens between clusters, assigning entire clusters to treatment or control contains the interference within clusters.
The challenge: identifying independent clusters in a real social or marketplace network. Perfect clusters rarely exist. Partial interaction between clusters still causes some interference, though less than individual-level randomization.
The cost: cluster randomization requires many more users for the same statistical power because the effective sample size is the number of clusters, not the number of users. Each cluster contributes one observation to your estimate.
Geographic randomization
Geography provides natural clusters for products with local network effects. A ride-sharing platform can randomize at the city level because riders in one city do not interact with drivers in another. A local marketplace can randomize at the neighborhood or region level.
This works well when network effects are geographically bounded. It fails when your product's network extends nationally or globally.
Time-based (switchback) designs
Instead of splitting users, alternate the entire population between treatment and control over time periods. Everyone gets treatment during Period A, everyone gets control during Period B, and you compare outcomes across periods.
Switchback designs eliminate interference entirely because everyone is in the same condition at the same time. The tradeoff is that time-based confounders (day of week, seasonality, external events) can bias the comparison. You need many switchback periods to average out these confounders, which extends the test duration.
Ego-network randomization
For social products, randomize based on ego networks — a user and all their direct connections. Assign entire ego networks to treatment or control so that treatment users only interact with other treatment users within their immediate network.
This approach contains direct interaction interference but requires sophisticated randomization infrastructure and only works when the social graph is relatively sparse.
Synthetic control methods
When randomization at the appropriate level is impractical, synthetic control methods construct a counterfactual by combining data from untreated units to match the pre-treatment trajectory of treated units. This is common in geographic experiments where you cannot randomize hundreds of regions.
Synthetic controls are not experiments in the traditional sense — they are observational causal inference methods — but they can provide useful estimates when experiments are impossible.
Practical Decision Framework
Choosing the right design depends on your product's interference structure.
If interactions are primarily within local groups (neighborhoods, teams, small communities): Use cluster randomization with clusters defined by the interaction boundary.
If interactions are primarily geographic: Use geographic randomization at the appropriate level (city, region, country).
If interference is instantaneous and universal (everyone competes for the same supply): Use switchback designs that alternate the entire population.
If interactions are sparse and social: Consider ego-network randomization.
If randomization at any level is impractical: Use observational causal inference methods (difference-in-differences, synthetic controls) and be explicit about the stronger assumptions required.
Measuring the Size of Interference
Before choosing a complex design, determine whether interference is actually large enough to matter for your decision. Several approaches help quantify interference.
Randomization saturation designs
Vary the proportion of treated users within clusters. Some clusters get twenty percent treatment, others fifty percent, others eighty percent. If outcomes in the control group vary by cluster treatment intensity, interference is present. The magnitude of this variation tells you how large the interference effect is.
Comparing individual-level and cluster-level estimates
Run an individual-level experiment and a cluster-level experiment simultaneously on different populations. If the estimates diverge, interference is biasing the individual-level estimate.
Pre-post analysis of rollout
After rolling out a change that was previously tested via A/B test, compare the actual impact to the predicted impact from the test. If the production impact differs substantially (especially if it is larger), interference was compressing the treatment effect during the test.
The Business Case for Getting This Right
From an economics perspective, interference-biased experiments cause two types of costly errors.
Under-investment in effective changes. When positive spillover compresses the measured treatment effect, good ideas look mediocre. Teams pass on changes that would have produced substantial impact at full rollout. The opportunity cost is invisible because you never see the impact you did not capture.
Over-investment in zero-sum changes. When negative spillover inflates the measured treatment effect, changes that merely redistribute value between users look like they create new value. You ship changes that show no impact at full rollout because the "improvement" for treatment users came at the expense of control users.
For marketplace businesses and platforms with strong network effects, the cumulative cost of these errors can be substantial. Investing in experimental designs that account for interference is not academic perfectionism. It is sound business practice.
FAQ
How do I know if my product has meaningful interference in experiments?
If users in your product interact with each other, compete for shared resources, or influence each other's behavior, interference is possible. The question is whether it is large enough to meaningfully bias your experiment results. Start with a saturation design or compare individual-level and cluster-level estimates to quantify the effect.
Does interference only matter for two-sided marketplaces?
No. Any product with user-to-user interaction can have interference. Social networks, collaboration tools, multiplayer games, content platforms with algorithmic feeds, and products with referral mechanics all exhibit some degree of interference.
Can I just ignore interference and use standard A/B testing?
You can, but you should understand the consequences. If positive spillover dominates, your estimates are conservative — you underestimate impact. If negative spillover dominates, your estimates are inflated — you overestimate impact. If you consistently make decisions based on biased estimates, you systematically misallocate resources.
How much more traffic do cluster-randomized experiments need?
It depends on the intra-cluster correlation — how similar outcomes are within clusters. Higher correlation means fewer effective independent observations per cluster. As a rough guide, cluster randomization often requires several times more total users than individual randomization to achieve the same statistical power.
Are there simpler alternatives to full cluster randomization?
Yes. Partial interference corrections adjust the standard treatment effect estimate using assumptions about the interference structure. They are less robust than full cluster randomization but require less traffic. Another approach is to simply run the individual-level experiment and note that your estimate may be conservative, accepting the bias as a known limitation.