Mobile and desktop are different products. CTA tests that ignore device segmentation make ship/revert decisions on aggregates that hide opposite-direction effects on each device class.
TL;DR
- The same CTA change can produce opposite-direction results on mobile vs desktop. Aggregates blend the two, often producing a noisy "directional" result that doesn't actually exist on either device.
- In a 200+ test portfolio, single-device-segmented tests (mobile-only or desktop-only) outperform "combined" tests by a meaningful margin. Combined tests had the lowest win rate of any platform category.
- Some change types are inherently asymmetric: sticky positioning, hero size, modal-mediated routing, form field changes. These should never be read on aggregate alone.
- The decision matrix below maps eight aggregate × segmented combinations to the right ship/revert action — including the cases where the aggregate is misleading.
Why aggregates lie
Aggregate results combine device classes proportional to their traffic share. If mobile is 70% of traffic, the aggregate weights mobile more heavily — even when the mobile-specific result is wrong for desktop users.
| Mobile result | Desktop result | Aggregate (70% mobile, 30% desktop) | What aggregate tells you |
| ------------- | -------------- | ----------------------------------- | -------------------------------------------------------- |
| +5% | +5% | +5% | Both devices win |
| +5% | -3% | +2.6% | "Directional positive" — but desktop is hurt |
| -2% | +8% | +1% | "Directional positive" — but mobile is hurt |
| -10% | +20% | +1% | "Directional positive" — but mobile regression is severe |
The last row is the warning sign. Aggregate ~+1% reads as a noisy directional win. Segmented by device, mobile is regressing 10% while desktop gains 20%. The right call is "ship on desktop, revert on mobile" — invisible to the aggregate.
Pattern data: segmented vs combined wins
Looking at 200+ tests across two years of an enterprise CRO portfolio, win rates differ meaningfully by how the test was scoped:
| Test platform | Win rate (range) |
| ---------------------------------------------------- | ---------------- |
| Desktop-only | ~22-26% |
| Mobile-only | ~24-28% |
| Both Desktop AND Mobile (segmented results required) | ~30-34% |
| Combined (no device segmentation) | <10% |
The "Combined" category — tests that did not require segmented results — had the lowest win rate by a wide margin. The mechanism is clear once you've seen it: combined tests bury opposite-direction effects, so even when something genuinely won on one device the aggregate often showed inconclusive.
When device asymmetry is most likely
Some CTA changes are device-agnostic; others have device-specific mechanisms. Anticipate which is which before launching the test.
| CTA change type | Asymmetry likelihood | Mechanism |
| ---------------------------------- | -------------------- | ---------------------------------------------------------------------------------- |
| Sticky positioning | High | Mobile viewport real estate is tighter; sticky impact differs |
| Hero size / above-fold restructure | High | Mobile viewport changes what's "above the fold"; desktop has more horizontal space |
| Modal-mediated routing | High | Mobile modal UX is more disruptive than desktop |
| Form field reduction | High | Mobile typing friction higher; desktop users tolerate more fields |
| CTA copy change | Low | Copy semantics travel across devices |
| Visual hierarchy / color | Low | Same visual logic on both |
| Button placement (within-section) | Medium | Depends on whether the section's layout differs by device |
For high-asymmetry change types, segment by device before reading the aggregate.
The asymmetry signatures
Three patterns recur across device-segmented CTA tests:
| Pattern | Mobile | Desktop | Action |
| ------------------------- | ----------------------- | ----------------------- | ---------------------------------------------- |
| Mobile-friendly only | Positive | Flat or slight negative | Ship on mobile only |
| Desktop-friendly only | Flat or slight negative | Positive | Ship on desktop only |
| Mobile-hostile | Strong negative | Positive | Revert on mobile (even if aggregate says ship) |
| Desktop-hostile | Positive | Strong negative | Revert on desktop |
| Universal | Same direction | Same direction | Ship/revert sitewide |
The first four signatures all imply device-conditional shipping. The infrastructure cost of conditional rollout is small relative to the funnel cost of shipping a regression on the wrong device class.
Decision matrix: device-segmented shipping
| Aggregate | Mobile segment | Desktop segment | Decision |
| --------- | -------------- | --------------- | ------------------------------------------------------------------ |
| Positive | Positive | Positive | Ship sitewide |
| Positive | Positive | Flat | Ship on mobile; hold on desktop |
| Positive | Flat | Positive | Ship on desktop; hold on mobile |
| Positive | Negative | Positive | Ship on desktop, revert on mobile — aggregate is misleading |
| Flat | Positive | Negative | Ship on mobile, revert on desktop |
| Flat | Negative | Positive | Ship on desktop, revert on mobile |
| Negative | Negative | Negative | Revert sitewide |
| Negative | Positive | Negative | Ship on mobile, revert on desktop — aggregate hides mobile win |
The aggregate is informational only. The decision is determined by the per-segment columns.
Worked example: a homepage iteration with strong asymmetry
A homepage hierarchy + offer-placement test produced positive aggregate results, but the device segmentation revealed the win was concentrated on desktop:
| Funnel metric | All Devices | Desktop | Mobile |
| --------------------- | ----------- | ------- | ------ |
| Page-entry rate | +2.4% | +7.4% | -0.7% |
| Mid-funnel completion | +7.0% | +9.4% | +5.6% |
| Downstream conversion | +11.8% | +23.9% | +4.2% |
The desktop segment carried most of the lift. Mobile was directionally positive on mid-funnel but negative on the upstream metric — a signature of a layout change that worked better on the desktop viewport. Decision: ship the change but plan a mobile-specific iteration to recover the upstream metric on mobile.
The suspected mobile cause: leading with form input above the hero on mobile rather than letting the message appear first. The desktop variant didn't have that problem because the wider viewport let both elements coexist above the fold.
Pre-test instrumentation requirements
For high-asymmetry change types, the test needs to be set up to read by device from day one:
| Requirement | Why |
| ------------------------------------------------ | -------------------------------------------------------------------------------- |
| Device class as a primary segmentation dimension | Standard segment, not custom-cut at analysis time |
| Per-device sample size targets | Mobile and desktop power separately; total may be powered while segments are not |
| Per-device MDE acceptance | Often need larger MDE on the smaller segment |
| Pre-committed device-conditional shipping plan | Decide before launch whether asymmetric results would ship on one device only |
Without these, a test producing strong asymmetry will be hard to interpret and harder to ship correctly.
When to NOT segment by device
A few contexts where aggregate reading is sufficient:
| Context | Why aggregate is OK |
| ---------------------------------------------------- | ------------------------------------ |
| Truly device-agnostic change (copy, color, semantic) | Mechanism doesn't differ by viewport |
| Single-device test (mobile-only or desktop-only) | One segment, no asymmetry possible |
| Test with sample size only powered for the aggregate | Segment-level reads will be noise |
For most other CTA tests, device segmentation should be a default report column.
Bottom line
Mobile and desktop are different products on the same site. CTA tests routinely produce opposite-direction effects on the two device classes. Aggregates hide this. The portfolio data shows combined-platform tests (no device segmentation) have the lowest win rate by a wide margin — because they bury the wins inside aggregates that look like noise.
Segment every CTA test by device class, especially for high-asymmetry change types (sticky, hero, modal, form-field). Use the per-device segment results — not the aggregate — to make ship/revert decisions. Conditional rollout is cheap to implement and saves the funnel from device-class regressions disguised as flat aggregates.