"We improved conversion rate by 8%."

That sentence means completely different things depending on what business you're running. For an e-commerce site, it might mean an 8% increase in transactions—directly tied to revenue. For a SaaS company, it might mean 8% more free trial signups that never convert to paid. For a media company, it might mean 8% more email subscribers who churn in 30 days.

Picking the wrong primary metric is the most expensive mistake you can make in experimentation. You can run a technically perfect test—proper randomization, right sample size, appropriate runtime—and still destroy business value if you're optimizing for the wrong thing.

This guide covers the right metric framework for each major revenue model, with specific Optimizely setup guidance and worked examples.

The Metric Hierarchy: Three Tiers for Every Test

Before getting into business-type specifics, every experiment you run needs three tiers of metrics:

Primary metric — The one number that determines whether the test wins or loses. You get one. If a test can only win on one metric, which one matters most? That's your primary. Optimizely uses this for statistical significance calculations.

Secondary metrics — Directional signals that help you understand why the primary moved. These don't determine the winner, but they help you interpret results and build your next hypothesis. You can have several.

Guardrail metrics — Things you must not harm while optimizing for the primary. If your test wins on CVR but destroys AOV, revenue per visitor may actually go down. Guardrail metrics catch these cases before you ship a winner that loses money.

The most common mistake: using a metric as your primary when it should be a guardrail, and vice versa.

**Pro Tip:** Write out all three tiers before building every test. "Primary: checkout completion rate. Secondary: time-to-checkout, add-to-cart rate, page scroll depth. Guardrail: AOV must not decline more than 2%, revenue per visitor must not decline." This takes five minutes and prevents a lot of post-test regret.

E-Commerce: CVR vs. Revenue Per Visitor vs. AOV

E-commerce has the most mature experimentation culture and the most traps.

The problem with CVR as primary: Conversion rate is sessions-that-transacted / total-sessions. A test that wins on CVR can still lose on revenue if it selectively attracts low-value transactions. Example: a discount badge increases CVR by 4% but decreases AOV by 12%. Revenue per visitor goes down, but CVR looks great.

The better primary: Revenue Per Visitor (RPV)

RPV = Total Revenue / Total Sessions. It captures both CVR and AOV in one metric. A test must increase overall revenue efficiency to win—you can't win by cannibalizing order value.

The downside of RPV: it has higher variance than CVR, so you need more traffic to reach significance. For high-traffic sites, this is manageable. For lower-traffic sites, you may need to use CVR as primary and AOV as a guardrail.

When to use which:

| Traffic Level | Primary | Guardrail | |---|---|---| | High (50K+ sessions/month) | Revenue Per Visitor | CVR (must not decline) | | Medium (10K–50K sessions/month) | CVR | AOV (must not decline >5%) | | Low (<10K sessions/month) | CVR (but note underpowered risk) | AOV |

Worked example: You're testing two product page layouts on a mid-traffic site. Primary metric: CVR (add-to-cart rate). Guardrail: AOV. Results: Variant A shows +6% CVR, AOV flat. Revenue per visitor is up ~6%. Winner is clear. If AOV had declined 8%, you'd flag it as a potential loser despite the CVR win, and investigate before shipping.

**Pro Tip:** For seasonal businesses (holiday retail, etc.), segment your RPV analysis by customer type: new vs. returning, coupon users vs. full-price buyers. A test that wins for new customers might tank for returning customers who know your usual price points.

Secondary metrics for e-commerce:

  • Add-to-cart rate (for product page tests)
  • Cart abandonment rate (for checkout tests)
  • Pages per session (for navigation/discovery tests)
  • Product return rate (long-horizon guardrail)

SaaS: The Full Funnel Metric Hierarchy

SaaS is where metric confusion causes the most damage. The funnel has multiple conversion points and each one is optimizable—but optimizing the wrong one is easy.

The SaaS funnel:

  1. Visitor → Free Trial / Signup (acquisition)
  2. Signup → Activation (first meaningful product use)
  3. Activation → Retention (coming back, hitting usage milestones)
  4. Retention → Paid Upgrade (conversion to revenue)

Most SaaS experimentation focuses on step 1. It should focus heavily on steps 2 and 3.

The trap: A test that increases free trial signups by 20% looks incredible—until you see activation rate drops from 45% to 30% because you attracted users who weren't actually ready to start. Net qualified pipeline: down.

Where each metric type belongs:

For landing page / top-of-funnel tests:

  • Primary: Free trial signup rate
  • Secondary: Time-on-page, scroll depth, CTA click rate
  • Guardrail: Activation rate of new signups (monitor post-test, not during)

For onboarding / activation tests:

  • Primary: Activation rate (define this precisely: "user completes core action within 7 days")
  • Secondary: Time-to-activation, feature adoption breadth
  • Guardrail: Support ticket rate, churn rate (90-day)

For upgrade / monetization tests:

  • Primary: Trial-to-paid conversion rate
  • Secondary: Time-to-upgrade, plan tier selection (are people upgrading to higher plans?)
  • Guardrail: Refund rate, churn rate at 30/60/90 days
**Pro Tip:** Define "activation" precisely before you run any SaaS experimentation. Activation is not signup. It's the specific behavior that predicts retention. Find yours in your cohort data: what did retained users do in their first session that churned users didn't? That action is your activation event.

The metric to watch above all others: Net Revenue Retention (NRR)

NRR captures expansion, contraction, and churn in one number. It's not measurable in a single experiment timeframe, but it's the business outcome all your test metrics should be pointing toward. Keep it in your peripheral vision as a sanity check on your optimization strategy.

Lead Generation: Form Completion Rate Is Not Your Real Goal

Lead gen teams optimize form completion rates. They should be optimizing qualified lead rate.

The difference: your contact form converts at 4%. But 70% of those leads are unqualified for your sales team. You're optimizing volume at the expense of quality, which creates downstream problems (wasted SDR time, poor win rates, frustrated sales leadership).

The real metric hierarchy for lead gen:

Primary in Optimizely: Form completion rate (because qualified lead rate requires CRM data that's downstream and delayed).

What you actually care about: SQL (Sales Qualified Lead) rate, opportunity creation rate, win rate from experiment cohorts.

How to proxy quality in Optimizely:

For form tests, use form field completion patterns as secondary metrics. Users who complete the "Company Size" and "Use Case" fields with substantive answers are better proxies for qualified leads than users who fill minimum required fields with garbage data.

For gated content tests (whitepapers, webinars), time-on-page post-download and return visit rate within 7 days are better proxies for qualified engagement than raw download count.

**Pro Tip:** If your CRM is Salesforce or HubSpot, you can track UTM parameters through to opportunity creation. Build a report that segments opportunities by experiment variant (set a custom UTM for each variant). It takes setup work, but it lets you validate that your form-completion wins are also qualified-lead wins—a connection most lead gen teams never make.

Common guardrail for lead gen: Lead quality score (if your CRM scores leads automatically). You don't want a test to win on volume while degrading quality score. Set a guardrail that quality score must not decline below threshold.

Media and Publishing: Why CTR Is a Trap

Media properties (publishers, content businesses, newsletters) have unique metric needs because their business model monetizes attention, not transactions.

The metrics that look good but aren't:

  • Pageviews — easily inflated by clickbait
  • Click-through rate on content — tells you what's clickable, not what's valuable
  • Time on page (alone) — inflated by users who left the tab open

The metrics that actually matter:

  • Scroll depth — Did users actually read the content? 70%+ scroll suggests genuine engagement.
  • Return visit rate — Within 7 and 30 days. Retention is the business model.
  • Newsletter subscription rate — Direct audience ownership, not platform-dependent.
  • Content completion rate — For video or audio: did they watch/listen through?
  • Pages per session — For ad-supported models, depth of visit affects revenue.

For ad-supported media:

Primary metric depends on your ad model. CPM (cost per thousand impressions) publishers should optimize pages per session and time on page. CPC publishers should balance CTR with engagement quality—a test that increases CTR while destroying time-on-site is likely driving accidental clicks.

For subscription media:

Conversion to subscription is your ultimate metric. Tests should track:

  • Primary: Subscription conversion rate (or paywall click rate, if that's the proximal action)
  • Secondary: Return visit rate, scroll depth, newsletter conversion
  • Guardrail: Subscriber churn rate in 30-day cohorts post-experiment
**Pro Tip:** For media properties, run experiments on article layout and content structure, not just on acquisition pages. The content experience drives return behavior, which drives subscription conversion. Most media CRO focuses too heavily on the top-of-funnel and not enough on the reading experience itself.

Marketplace: Measuring Both Sides

Marketplace businesses (platforms connecting buyers and sellers, renters and owners, workers and employers) have a two-sided metric problem. A test that improves buyer conversion might reduce seller supply quality—or vice versa.

Buyer-side metrics:

  • Search-to-contact rate or booking rate
  • Session-to-purchase rate
  • Repeat purchase rate (retention signal)

Seller-side metrics:

  • Listing completion rate
  • Listing quality score (if applicable)
  • Response rate / response time (affects buyer conversion)
  • Seller retention rate

The challenge: A test that makes it easier for buyers to purchase might also attract low-intent browsing, which drives up contact rate without improving conversion quality—creating noise for sellers and degrading their experience.

Always run marketplace tests with both-side metrics visible. A win on buyer CVR that decreases seller listing rate or response rate is a net negative for the marketplace.

**Pro Tip:** For marketplace experiments, define a "marketplace health" composite guardrail before you start. Include metrics from both sides. A test cannot win if marketplace health degrades, even if buyer CVR looks great.

The Guardrail Metric Concept in Practice

A guardrail metric is a threshold, not a goal. You're not optimizing for it—you're protecting against it.

Setting guardrails requires judgment calls:

  • AOV in e-commerce: "Must not decline more than 3% relative" — tighter for low-margin products, looser for high-margin
  • Activation rate in SaaS: "Must remain above 40%" — your historical baseline for qualified traffic
  • Churn rate post-experiment: "30-day churn in the variant cohort must not exceed 10% above control" — this requires post-experiment monitoring, not just during-experiment tracking

Optimizely shows all your metrics in the results dashboard—set up your guardrail metrics as tracked metrics so they're visible during analysis. Don't wait until after you've shipped to check them.

Common Mistakes

Optimizing a funnel metric that doesn't connect to revenue — Free trial signups are a funnel metric. Monthly recurring revenue is a business metric. Make sure you have a clear, documented theory of how your test metric connects to the metric leadership cares about.

Changing the primary metric mid-test — If you switch from CVR to RPV halfway through a test because CVR isn't moving, you're data-fishing. Lock your primary metric before launch.

No guardrails for "soft" tests — Teams often set guardrails for revenue-adjacent tests but skip them for UX or content tests. Every test should have at least one guardrail. Even a headline copy test can harm brand perception or increase support volume.

Ignoring long-horizon metrics — Some test effects take months to materialize (churn, LTV, referral behavior). Don't declare wins solely on immediate metrics without planning for cohort analysis 30, 60, 90 days out.

Metric selection by committee — "The team agreed to track 12 metrics." Tracking 12 metrics means you have no primary metric. Statistical multiple testing means one of your 12 metrics will show significance purely by chance at any given moment.

What to Do Next

  1. For your current active tests, write out all three tiers: primary, secondary, guardrails. If you don't have all three written down, stop and do this before analyzing.
  2. For your business model, identify your north-star metric—the one that leadership cares about most. Build a documented chain from your test metrics to that north-star.
  3. Review your last five test results with your guardrail metrics in view. Were there any "winners" that harmed guardrail metrics?
  4. Set up your Optimizely experiment template to require all three tiers before a test can be approved for launch.

Once you're measuring the right things, the next challenge is communicating your results to people who don't speak statistics. Read How to Share A/B Test Results With Stakeholders for the one-page template that makes experiment results land with leadership.

Share this article
LinkedIn (opens in new tab) X / Twitter (opens in new tab)
Written by Atticus Li

Revenue & experimentation leader — behavioral economics, CRO, and AI. CXL & Mindworx certified. $30M+ in verified impact.