Inside the Trade-Off: How Dark Patterns Actually Get Designed

Meta description: Regulators' complaints increasingly include the internal emails and test data behind a dark pattern, not just the pattern itself. A pillar guide to the design trade-offs behind consumer-protection enforcement — for marketers and growth leaders in regulated and reputation-sensitive industries.

Most writing on dark patterns treats them as a UI taxonomy — confirmshaming, roach motels, fake urgency — as if the failure is aesthetic. It isn't. Every case in this series has a paper trail showing a specific person or team making a specific trade-off: a metric they were accountable for, a design choice that moved it, and a point where that choice stopped being persuasion and became something a regulator could act on. That paper trail is now public, in unusual detail, because recent FTC complaints have started attaching the internal emails and A/B test results directly as exhibits.

That changes what's worth writing about. The pattern itself — "cancellation was hard" — is not that instructive; every practitioner already knows friction reduces churn. What's instructive is the trade-off calculus that made hard cancellation look like the correct call at the time, to people who were not stupid or malicious, sitting in a regulated or heavily-lawyered company. That's the gap this series covers, and it's the gap that matters if you're a marketer, founder, or growth leader operating in fintech, healthtech, insurance, or any other space where a design decision doubles as a disclosure decision.

The comparative record

CaseRegulatorWhat was actually optimizedOutcome
TrivagoACCC (Australia)A ranking algorithm's revenue weight (CPC bids) vs. its stated promise ("best deal")AU$44.7M penalty (2022)
Hotel booking sector (Booking.com, Expedia, Agoda, Hotels.com, ebookers, and others)CMA (UK)Conversion-boosting urgency/scarcity messaging vs. its accuracyFormal undertakings — no fine, but binding sector-wide behavior change (2019)
Amazon (Prime)FTC (US)Subscription revenue retention vs. cancellation friction, with the trade-off explicitly debated and decided by named executives$2.5B — $1B penalty + $1.5B consumer redress (2025)
Epic Games (Fortnite)FTC (US)Purchase-path friction (near-zero) vs. refund-path friction (deliberately high)$520M — $245M dark-patterns refunds + $275M COPPA penalty (2022)
VonageFTC (US)Retention-agent-only cancellation vs. self-service cancellation~$100M in consumer refunds (2022–23)
Credit KarmaFTC (US)Click-through rate of an approval-odds claim vs. the claim's accuracy for the individual seeing it — A/B tested, and the higher-converting variant was known to be less accurate$3M (2023)
BetterHelpFTC (US)Ad-targeting revenue from health-questionnaire data vs. an explicit privacy promise made at intake$7.8M + data-sharing ban (2023)
SephoraCalifornia AGAd-tech partner revenue vs. CCPA's "sale" disclosure and opt-out requirements$1.2M — first CCPA enforcement action (2022)
Google, Facebook, Amazon (cookie consent)CNIL (France)Consent-rate optimization vs. symmetric choice architecture required by French law€150M (Google), €60M (Facebook), €35M (Amazon) (2020–22)

Read down that middle column and a shape emerges: in every row, there was a real, measurable, defensible-sounding business metric on one side, and a disclosure or symmetry obligation on the other. Nobody in these cases was optimizing for "deceive the user" as a stated goal. They were optimizing for retention, conversion, consent rate, or click-through — the same metrics every growth team is accountable for — and the deception was what that optimization produced once it ran past the point where the metric and the truth diverged.

What the internal evidence actually shows

Two cases in this table have unusually detailed public records of the reasoning behind the design, and they're worth sitting with because they show the trade-off being made through an organization's normal decision process, not through a single bad actor.

Amazon's Prime cancellation flow — internally named "Iliad," after Homer's epic, reportedly because completing it felt like a similarly long ordeal — is described in the FTC's amended 2023 complaint as a four-page, six-click, fifteen-option sequence, with repeated interstitial offers (discounts, benefit reminders) along the way. The complaint alleges that internal proposals to simplify this flow were repeatedly reviewed and not approved, because simplification was projected to reduce subscription revenue, and cites a draft internal memo stating that clarifying the enrollment process was not the "right approach" because it would cause a "shock" to business performance. (Amazon settled without admitting or denying the FTC's allegations, as is standard in FTC consent orders; the underlying facts here are what the complaint alleges, not adjudicated findings, except where the parties agreed to them in the settlement.) Read as a design-environment story rather than a personnel story, the instructive part is that a retention metric survived multiple internal review cycles in a company with a full legal and compliance function — which says something about how hard it is to unwind a friction-based pattern once it's load-bearing for a reported metric, regardless of how sophisticated the organization reviewing it is.

Credit Karma's "pre-approved" claim is the more useful case for anyone who runs experimentation programs specifically, because the mechanism is a standard A/B test, not an executive memo. The FTC's complaint describes Credit Karma testing "pre-approved" language against more accurate framing like "excellent odds," finding that "pre-approved" converted better — and shipping it, despite the company's own data showing that close to a third of consumers who applied for these "pre-approved" offers were later denied at underwriting. This is a growth team doing everything a growth team is trained to do: forming a hypothesis, running a controlled test, reading a clear win, and shipping the winner. The failure wasn't sloppy testing. It was a success metric — click-through and application rate — that had no mechanism for capturing the cost imposed on the roughly one-in-three users the claim was wrong about. The test was rigorous. The guardrail was missing. (Credit Karma likewise settled via consent order without admitting or denying the FTC's allegations.)

Both of these are going to get their own full treatment later in this series, because a comparative table can show you the pattern but can't show you the decision. The Amazon case is a story about how a retention metric survives repeated internal challenge inside a company with a full legal function watching it. The Credit Karma case is a story about what happens when an A/B testing program measures conversion without measuring the downstream cost of being wrong — which is a mistake available to a two-person startup team just as easily as a public company.

The diagnostic

After working through enough of these, the trade-off collapses into four checks I run before shipping anything that touches enrollment, pricing, cancellation, consent, or ranking — checks that exist specifically to catch a metric win that's quietly built on an asymmetry or an unmeasured cost.

1. Exit symmetry. Is leaving as easy as entering, measured in the same units — clicks, steps, channels available? Amazon and Vonage are both this test, failed at the level of formal company policy, not a rogue variant.

2. Information parity. Do "accept" and "reject/decline" get the same visual weight, step count, and clarity? This is the CNIL cookie-consent finding, and it's the easiest of the four to walk into by accident, because a variant that visually de-emphasizes "no" will usually win on the metric you're watching — that's exactly the trap, not a coincidence.

3. Comparison honesty. When a claim depends on a ranking, discount, or comparison, is the thing being compared actually equivalent? Trivago and the CMA hotel-sector finding are this test failing at the algorithm level — a "best deal" claim resting on a variable (who pays more) the user never sees.

4. Guardrail coverage. Does your test's success metric capture the cost imposed on users for whom the winning variant is wrong? Credit Karma's test isn't a story about deceptive intent — it's a story about a guardrail metric that didn't exist. If your experimentation platform can tell you conversion rate but not "cost to the segment where this claim doesn't hold," you have the same gap.

None of these require a compliance department to run. They require treating "did it win" as an incomplete question, and building the second question into how a test is scoped before it ships — which is a testing-methodology change, not a legal one.

This is Part 1 of a series

A comparative table and a diagnostic checklist are useful, but they compress exactly the thing that's actually instructive: the specific reasoning, trade-off, and missed guardrail behind each case. The rest of this series takes individual cases and does what this piece can't — walks through the design decision in the amount of detail a growth or product leader would need to recognize the same trade-off forming inside their own roadmap:

  • The Credit Karma A/B test, as an experimentation-methodology case study — what a rigorous, correctly-run test missed, and the guardrail-metric fix.
  • Amazon's "Iliad" cancellation flow, as a subscription-business governance story — how a retention metric and a simplicity proposal kept losing to each other across repeated internal review.
  • Fortnite's purchase and refund design, as a UX-principle-applied-to-the-wrong-context story — what happens when a legitimate "minimize interruption" instinct isn't given an exception for real-money actions.
  • Cookie consent and CCPA's "sale" definition, as a choice-architecture story — the difference between a banner that exists and a banner where accepting and declining take equal effort.
  • The UK hotel-booking sector's undertakings, as the one case in this series with no fine — what a regulator catching a pattern early, sector-wide, with a commitment-first tool, looks like next to the cases that weren't caught early.

If you're building or scaling a growth or marketing function in fintech, healthtech, insurance, or another regulated or reputation-sensitive space, and want a second set of eyes on where a testing program's incentives might be forming one of these gaps before it becomes a finding, that's the kind of engagement I take on. Get in touch.

Share this article
LinkedIn (opens in new tab) X / Twitter (opens in new tab)
Atticus Li

Experimentation and growth leader. CXL-certified CRO practitioner, Mindworx-certified behavioral economist (1 of ~1,000 worldwide). 200+ A/B tests across energy, SaaS, fintech, e-commerce, and marketplace verticals.