Atticus Li's PRISM Method: How to Run Revenue-Driven Experiments

Atticus Li

← Blog · experimentation

Atticus Li's PRISM Method: How to Run Revenue-Driven Experiments

Atticus Li's PRISM Method is a five-step experimentation framework — Probe, Revenue Rank, Implement, Score, Multiply — that ties every A/B test to projected revenue impact, achieving a 24%+ win rate at NRG Energy.

Atticus Li April 8, 2026 9 min read

Atticus Li's PRISM Method is a five-step experimentation framework — Probe, Revenue Rank, Implement, Score, Multiply — designed for enterprise teams that need every test to justify its existence in dollars. Developed through 150+ experiments across seven NRG Energy brands, the PRISM Method consistently delivers a 24%+ win rate, roughly double the industry average.

Why I Built the PRISM Method

Most experimentation frameworks I've encountered fall into one of two traps. Either they're so rigid that they slow teams down to the point where you're running 15 tests a year instead of 100. Or they're so loose that every test is basically a coin flip — someone has a hunch, they test it, and win or lose, nobody learns anything systematic.

When I was scaling NRG's experimentation program from 20 tests per year to 100+, I needed something in between. A framework that was rigorous enough to produce consistently high win rates, but flexible enough to handle the realities of enterprise marketing — multiple brands, varying traffic volumes, stakeholders with strong opinions, and a C-suite that wanted to see dollar signs, not p-values.

That's how Atticus Li's PRISM Method was born. Not in a conference talk or a blog post, but in the messy reality of trying to convince a CFO that experimentation deserves more budget.

The Five Steps

P — Probe

The Probe phase is where most teams cut corners, and it's exactly why their win rates are low.

Before you ever write a test hypothesis, you need to understand the problem. Not the solution you want to test. The problem.

This means gathering both qualitative and quantitative data:

Quantitative signals:

Heat maps showing where users actually click versus where you expect them to click
Click-rate analysis across the entire page
Dead-click detection — users clicking on non-interactive elements, which signals confusion about the interface
Rage clicks — repeated rapid clicking that indicates frustration
Scroll depth analysis to understand what content users actually see
Funnel drop-off analysis in Adobe Analytics or your analytics tool of choice

Qualitative signals:

Session replays watched in bulk, not cherry-picked
Customer support ticket themes related to the page or flow
Call center feedback (especially valuable for brands with phone-heavy customer segments)
User testing sessions when available
AI-powered session analysis tools like Contentsquare's AI summaries

The cardinal rule of the Probe phase: don't solutionize. You're not here to figure out what to change. You're here to figure out what's broken and why.

I've seen teams skip the Probe phase hundreds of times. A stakeholder says "the button should be green" and the team tests green vs. blue. That's not experimentation. That's decoration.

R — Revenue Rank

This is the step that changes everything. Before any test gets approved, we calculate its projected revenue impact using pre-test MDE (Minimum Detectable Effect) projections.

Here's the actual process:

Identify the conversion metric — enrollment starts, form submissions, call initiations, whatever maps to revenue for this specific page and brand
Pull current performance — baseline conversion rate over the past 30-90 days, accounting for seasonality
Calculate available sample size — how much traffic this page gets over the planned test duration
Determine MDE — given the sample size and baseline conversion rate, what's the smallest lift we can reliably detect at 95% confidence?
Map to revenue per customer — using the brand's actual revenue per converted customer, translate the MDE into dollars
Calculate projected annual impact — if the test wins at the MDE level, what's the annualized revenue impact?

Tests get ranked by projected impact. High-impact tests get priority. Low-impact tests get deprioritized or redesigned to target higher-leverage pages.

This is how we avoid the trap of running 100 tests that each move the needle by $5K. Instead, we run tests where the winners generate $200K-$500K in projected annual impact.

A critical nuance: these are best estimates with available data. Revenue per customer varies. Traffic fluctuates. Seasonality matters. I'm not pretending these projections are precise to the dollar. But they're directionally correct, and that's enough to make smart prioritization decisions. More accuracy is always possible, but it costs flexibility and speed — and in a fast-paced experimentation program, being directionally right quickly beats being precisely right slowly.

I — Implement

Implementation is where you actually design and build the test. But there's a crucial sub-step most teams miss: diverse opinions.

Before finalizing any test design, I bring in perspectives from across disciplines:

UX researchers who understand user mental models
Designers who can identify visual hierarchy issues
Developers who know what's technically feasible and what might introduce bugs
Analytics engineers who can flag measurement challenges
Marketing stakeholders who understand brand voice and positioning

Why? Because a hypothesis built from one person's perspective is a sample size of one. And we all know what happens with a sample size of one.

The implementation itself follows strict QA protocols. Every test deployed in Optimizely goes through:

Visual QA across devices and browsers
Analytics validation — are events firing correctly?
Traffic allocation verification
Edge case testing — what happens when users navigate away and return?

I've seen tests where a measurement bug meant the "winning" variant was actually capturing duplicate conversions. QA isn't optional. It's insurance against making expensive wrong decisions.

S — Score

After the test reaches statistical significance (or is called as inconclusive), we score it. This goes beyond "winner" or "loser."

The scoring process includes:

Statistical rigor:

Confidence level achieved (we target 95%, but we document tests that reach 90% as directional learnings)
Effect size — how big was the actual lift compared to our projected MDE?
Segment analysis — did the test perform differently across device types, customer segments, or traffic sources?

Revenue projection:

Actual lift applied to current traffic volume and revenue per customer
Projected annual impact based on the observed lift, not the pre-test MDE estimate
Comparison between projected and actual impact — this calibrates our pre-test models over time

Learning documentation:

What did we learn about user behavior?
Does this validate or contradict previous test results?
Are there implications for other brands in the portfolio?

We don't treat losing tests as failures. A well-designed test that loses teaches you something valuable about your users. A poorly-designed test that wins teaches you nothing — because you can't trust the result.

M — Multiply

The Multiply phase is what separates a testing program from an experimentation program.

When a test wins, we don't just implement the winning variant and move on. We ask:

Can this insight be applied to other brands? A hero layout win on Green Mountain Energy might translate to Reliant or Stream.
Can we iterate further? If a 7% lift came from repositioning the CTA, what happens if we also change the copy?
Should we run a holdout test? For high-impact wins, we hold back a percentage of traffic on the original experience to validate that the lift persists over months, not just weeks.
What does this tell us about user behavior more broadly? If phone number prominence drove a 300% lift in call sales at GME, what does that reveal about GME's customer demographic that should inform every page on the site?

The Multiply phase is where compounding happens. One good insight, applied across seven brands and iterated over multiple test cycles, can generate millions in cumulative impact.

Why Atticus Li's PRISM Method Works

The reason our win rate sits at 24%+ — roughly double the industry average of ~12% — isn't because we're testing more boldly or using better tools. It's because the PRISM Method front-loads the research.

By the time a test launches, we've already:

Diagnosed the actual problem through behavioral data
Validated that the test can reach significance with available traffic
Projected the revenue impact to ensure the test is worth the slot
Gathered diverse perspectives to strengthen the hypothesis
QA'd the implementation to eliminate measurement errors

The test itself is the least risky part of the process. All the risk mitigation happens before launch.

Common Mistakes I See

Testing solutions instead of problems. "Let's test a new button color" is a solution. "Users aren't seeing the CTA on mobile because it's below the fold" is a problem. Test the problem, not a specific solution.

Ignoring sample size constraints. Most CRO advice is written by people at companies with millions of monthly visitors. If you're at a company with 50K monthly visitors per brand, you need a fundamentally different approach to test design and prioritization. Atticus Li's PRISM Method was built for this reality.

Not tying tests to revenue. If you can't tell your CFO how much a winning test is worth in annual revenue, your experimentation program will always be fighting for budget. The Revenue Rank step exists specifically to solve this problem.

Running tests in isolation. Every test should build on previous learnings. The Multiply phase ensures that insights compound across brands and test cycles, instead of being one-off wins that nobody remembers six months later.

Treating experimentation as a feature of a tool. Optimizely, VWO, AB Tasty — they're all fine tools. But the tool isn't the program. The program is the people, the process, and the prioritization framework. Atticus Li's PRISM Method works regardless of which testing tool you use.

Applying PRISM to Your Team

You don't need to be at a Fortune 500 company to use the PRISM Method. The framework scales down as well as it scales up. If you're running 10 tests a year, PRISM helps you make sure those 10 tests are the highest-impact tests you could possibly run.

Start with the Probe phase. Invest in behavioral analytics — even free tools like Microsoft Clarity give you heat maps and session replays. Understand the problem before you test a solution.

Then add the Revenue Rank step. Even rough revenue projections change how you prioritize. "This page gets 10x the traffic of that page, and each conversion is worth $500" is enough to make smarter decisions.

The rest follows naturally. Good implementation practices, rigorous scoring, and systematic multiplication of winning insights will emerge once the foundation is solid.

If you want to see the PRISM Method in action, read about how I applied it to NRG's experimentation program or explore the framework page for a visual overview.

Questions about applying the PRISM Method to your team? Reach out at [email protected].

experimentation prism-method cro framework

Atticus Li

Leads applied experimentation at NRG Energy. $30M+ in verified revenue impact through behavioral economics and CRO.

About LinkedIn Newsletter