Atticus Li leads Applied Experimentation at NRG Energy (Fortune 150), where he runs 100+ experiments per year and generated $30M in verified revenue impact in 2025. He writes about the operational reality of building experimentation programs that survive contact with organizational politics.

I have a rule on my team that makes some people uncomfortable: if you can't connect your test metric to revenue, you don't run the test. Not "eventually connects to revenue." Not "is a leading indicator that theoretically drives revenue." Directly connects. With a documented path from the metric you're measuring to dollars the company collects.

This rule has killed a lot of test ideas. It's also the reason the program generated $30M in verified impact last year.

The Vanity Metric Trap

Let me describe what most experimentation programs actually measure, because I've audited enough of them to see the pattern.

Click-through rate on a button. Time on page. Scroll depth. Video play rate. "Engagement score" — whatever that means in your organization. Bounce rate reduction. Page load improvements. Form field completion rate.

None of these are bad metrics in isolation. Some of them are genuinely useful diagnostic signals. The problem is when they become the primary success metric for your A/B test. Because when they do, you're optimizing for activity, not outcomes.

Here's the scenario I see over and over. A team runs a test on a landing page. They redesign the hero section, update the copy, add social proof. The variant increases CTR on the primary CTA by 18%. The team celebrates. The test gets shipped.

Three months later, someone pulls the downstream data. Signups didn't change. Revenue didn't change. The 18% CTR lift was real, but it just moved people from clicking one thing to clicking another. The actual conversion — someone becoming a paying customer — was completely flat.

That's not a win. That's a vanity metric masquerading as a business result.

Full-Funnel Accountability

The fix is conceptually simple and operationally hard: measure the whole funnel, not just the step you're optimizing.

When I evaluate a test, I don't just look at the metric closest to the change. I look at what happens downstream. Did the landing page change increase form submissions? Great. Did those form submissions convert to qualified leads at the same rate? Did those leads close at the same rate? Did those customers retain?

Because I've seen every version of the downstream failure.

The landing page "win" that kills signups. We tested a simplified enrollment flow that removed three form fields. Completion rate jumped 22%. We were ready to ship it. Then we checked — the leads from the simplified form had a 40% lower close rate. We were generating more leads, but worse ones. The form fields we removed were actually qualifying the prospect. Net revenue impact: negative.

The checkout optimization that increases completions but tanks revenue. A team I consulted with optimized their checkout flow and saw a 12% increase in completed orders. But average order value dropped 15% and return rate spiked. Customers were completing purchases faster, but they were also buying impulsively and returning more. Revenue per user actually went down.

The email CTR win that doesn't move subscriptions. An email redesign increased click-through by 25%. Impressive, right? But the clicks were going to a content page, not the signup page. The email was more engaging but less effective at driving the action that actually mattered.

Every one of these would have been celebrated as a win if the team only measured the metric closest to the change. Full-funnel accountability caught the problem.

The Downstream Check

Here's the framework I use. For every test, before it launches, you have to answer these questions in the experiment brief.

What is the primary metric? This should be as close to revenue as your measurement capability allows. If you can measure revenue directly, that's your primary metric. If you can't, it should be the closest reliable proxy — completed purchases, qualified leads submitted, subscriptions activated.

What is the downstream validation metric? Even if your primary metric is a proxy, you should have a plan to check what happens further down the funnel. If you're measuring form submissions, your downstream check is lead quality and close rate. If you're measuring completed purchases, your downstream check is return rate and lifetime value.

What's the revenue-per-user impact? This is where it gets concrete. You need a documented formula that translates your primary metric into dollars. Revenue per user is the simplest version of this: take the total revenue from the test period, divide by the number of users in each variant, and compare.

Revenue per user cuts through all the noise. A test can increase CTR, increase form submissions, increase page engagement — but if revenue per user doesn't move, the test didn't generate business value. Full stop.

Tying Every Test to a Dollar Value

Here's how we actually do this at NRG, simplified for clarity.

Every test gets a projected revenue impact before it launches. The formula is: (projected lift in primary metric) x (conversion rate through remaining funnel steps) x (average revenue per converted user) x (monthly traffic volume) x (12 months).

That gives you an annualized revenue projection. It's an estimate. It will be wrong. But it forces the team to think about the full path from test metric to revenue before they invest a single sprint in the test.

After the test concludes, we calculate the actual revenue impact using the same formula with observed data. The projected vs. actual comparison is one of the most valuable exercises in the program. It calibrates the team's intuition about what drives revenue and what doesn't.

Over time, you build a database of projected vs. actual revenue impacts. That database becomes incredibly powerful for prioritization. You start to learn which types of tests reliably translate to revenue and which types produce vanity lifts that evaporate downstream.

Why This Is Hard (And Why It Matters Anyway)

I'm not going to pretend this is easy. Measuring full-funnel impact requires instrumentation that most companies don't have on day one. You need to be able to track a user from the experiment through to the revenue event, which might be weeks or months later. You need data pipelines that connect your experimentation platform to your revenue data. You need agreement on what "revenue" means in your context.

At NRG, building this instrumentation took months. It required partnerships with the data engineering team, finance, and IT. It wasn't glamorous work. But it's the foundation that makes everything else possible.

If you can't measure revenue directly today, start with the best proxy you have and build toward full-funnel measurement. Even moving from CTR to form submissions is a meaningful step. The key is to always be pushing the measurement closer to the thing that actually matters: did the company make more money?

The Culture Shift

The biggest change isn't technical. It's cultural. When you switch from vanity metrics to revenue metrics, your win rate drops. Tests that would have been "wins" on CTR or engagement are now flat or losses when measured against revenue. That can be uncomfortable for stakeholders who are used to seeing a steady stream of wins.

This is where leadership storytelling matters. You have to explain that a lower win rate on revenue metrics is worth more than a higher win rate on vanity metrics. A 24% win rate where every win represents verified revenue is infinitely more valuable than a 60% win rate where half the "wins" don't actually impact the business.

The first time you present a $2M revenue impact from a single test — backed by real revenue data, not CTR extrapolation — the culture shifts. People stop caring about the vanity numbers because the real numbers are so much more compelling.

Measure what matters. Connect every test to revenue. Accept the lower win rate. Let the dollars speak.

---

_Need to calculate the revenue impact of your experiments? GrowthLayer's A/B test calculator helps you project sample sizes, estimate revenue lift, and validate statistical significance — all in one place._

Share this article
LinkedIn (opens in new tab) X / Twitter (opens in new tab)
Atticus Li

Experimentation and growth leader. Builds AI-powered tools, runs conversion programs, and writes about economics, behavioral science, and shipping faster.