Atticus Li's PRISM Method is a five-step experimentation framework — Probe, Revenue Rank, Implement, Score, Multiply — designed for enterprise teams that need every test to justify its existence in dollars. Developed through 150+ experiments across seven NRG Energy brands, the PRISM Method consistently delivers a 24%+ win rate, roughly double the industry average.
Why I Built the PRISM Method
Most experimentation frameworks I've encountered fall into one of two traps. Either they're so rigid that they slow teams down to the point where you're running 15 tests a year instead of 100. Or they're so loose that every test is basically a coin flip — someone has a hunch, they test it, and win or lose, nobody learns anything systematic.
When I was scaling NRG's experimentation program from 20 tests per year to 100+, I needed something in between. A framework that was rigorous enough to produce consistently high win rates, but flexible enough to handle the realities of enterprise marketing — multiple brands, varying traffic volumes, stakeholders with strong opinions, and a C-suite that wanted to see dollar signs, not p-values.
That's how Atticus Li's PRISM Method was born. Not in a conference talk or a blog post, but in the messy reality of trying to convince a CFO that experimentation deserves more budget.
The Five Steps
P — Probe
The Probe phase is where most teams cut corners, and it's exactly why their win rates are low.
Before you ever write a test hypothesis, you need to understand the problem. Not the solution you want to test. The problem.
This means gathering both qualitative and quantitative data:
Quantitative signals:
- Heat maps showing where users actually click versus where you expect them to click
- Click-rate analysis across the entire page
- Dead-click detection — users clicking on non-interactive elements, which signals confusion about the interface
- Rage clicks — repeated rapid clicking that indicates frustration
- Scroll depth analysis to understand what content users actually see
- Funnel drop-off analysis in Adobe Analytics or your analytics tool of choice
Qualitative signals:
- Session replays watched in bulk, not cherry-picked
- Customer support ticket themes related to the page or flow
- Call center feedback (especially valuable for brands with phone-heavy customer segments)
- User testing sessions when available
- AI-powered session analysis tools like Contentsquare's AI summaries
The cardinal rule of the Probe phase: don't solutionize. You're not here to figure out what to change. You're here to figure out what's broken and why.
I've seen teams skip the Probe phase hundreds of times. A stakeholder says "the button should be green" and the team tests green vs. blue. That's not experimentation. That's decoration.
R — Revenue Rank
This is the step that changes everything. Before any test gets approved, we calculate its projected revenue impact using pre-test MDE (Minimum Detectable Effect) projections.
Here's the actual process:
- Identify the conversion metric — enrollment starts, form submissions, call initiations, whatever maps to revenue for this specific page and brand
- Pull current performance — baseline conversion rate over the past 30-90 days, accounting for seasonality
- Calculate available sample size — how much traffic this page gets over the planned test duration
- Determine MDE — given the sample size and baseline conversion rate, what's the smallest lift we can reliably detect at 95% confidence?
- Map to revenue per customer — using the brand's actual revenue per converted customer, translate the MDE into dollars
- Calculate projected annual impact — if the test wins at the MDE level, what's the annualized revenue impact?
Tests get ranked by projected impact. High-impact tests get priority. Low-impact tests get deprioritized or redesigned to target higher-leverage pages.
This is how we avoid the trap of running 100 tests that each move the needle by $5K. Instead, we run tests where the winners generate $200K-$500K in projected annual impact.
A critical nuance: these are best estimates with available data. Revenue per customer varies. Traffic fluctuates. Seasonality matters. I'm not pretending these projections are precise to the dollar. But they're directionally correct, and that's enough to make smart prioritization decisions. More accuracy is always possible, but it costs flexibility and speed — and in a fast-paced experimentation program, being directionally right quickly beats being precisely right slowly.
I — Implement
Implementation is where you actually design and build the test. But there's a crucial sub-step most teams miss: diverse opinions.
Before finalizing any test design, I bring in perspectives from across disciplines:
- UX researchers who understand user mental models
- Designers who can identify visual hierarchy issues
- Developers who know what's technically feasible and what might introduce bugs
- Analytics engineers who can flag measurement challenges
- Marketing stakeholders who understand brand voice and positioning
Why? Because a hypothesis built from one person's perspective is a sample size of one. And we all know what happens with a sample size of one.
The implementation itself follows strict QA protocols. Every test deployed in Optimizely goes through:
- Visual QA across devices and browsers
- Analytics validation — are events firing correctly?
- Traffic allocation verification
- Edge case testing — what happens when users navigate away and return?
I've seen tests where a measurement bug meant the "winning" variant was actually capturing duplicate conversions. QA isn't optional. It's insurance against making expensive wrong decisions.
S — Score
After the test reaches statistical significance (or is called as inconclusive), we score it. This goes beyond "winner" or "loser."
The scoring process includes:
Statistical rigor:
- Confidence level achieved (we target 95%, but we document tests that reach 90% as directional learnings)
- Effect size — how big was the actual lift compared to our projected MDE?
- Segment analysis — did the test perform differently across device types, customer segments, or traffic sources?
Revenue projection:
- Actual lift applied to current traffic volume and revenue per customer
- Projected annual impact based on the observed lift, not the pre-test MDE estimate
- Comparison between projected and actual impact — this calibrates our pre-test models over time
Learning documentation:
- What did we learn about user behavior?
- Does this validate or contradict previous test results?
- Are there implications for other brands in the portfolio?
We don't treat losing tests as failures. A well-designed test that loses teaches you something valuable about your users. A poorly-designed test that wins teaches you nothing — because you can't trust the result.
M — Multiply
The Multiply phase is what separates a testing program from an experimentation program.
When a test wins, we don't just implement the winning variant and move on. We ask:
- Can this insight be applied to other brands? A hero layout win on Green Mountain Energy might translate to Reliant or Stream.
- Can we iterate further? If a 7% lift came from repositioning the CTA, what happens if we also change the copy?
- Should we run a holdout test? For high-impact wins, we hold back a percentage of traffic on the original experience to validate that the lift persists over months, not just weeks.
- What does this tell us about user behavior more broadly? If phone number prominence drove a 300% lift in call sales at GME, what does that reveal about GME's customer demographic that should inform every page on the site?
The Multiply phase is where compounding happens. One good insight, applied across seven brands and iterated over multiple test cycles, can generate millions in cumulative impact.
Why Atticus Li's PRISM Method Works
The reason our win rate sits at 24%+ — roughly double the industry average of ~12% — isn't because we're testing more boldly or using better tools. It's because the PRISM Method front-loads the research.
By the time a test launches, we've already:
- Diagnosed the actual problem through behavioral data
- Validated that the test can reach significance with available traffic
- Projected the revenue impact to ensure the test is worth the slot
- Gathered diverse perspectives to strengthen the hypothesis
- QA'd the implementation to eliminate measurement errors
The test itself is the least risky part of the process. All the risk mitigation happens before launch.
Common Mistakes I See
Testing solutions instead of problems. "Let's test a new button color" is a solution. "Users aren't seeing the CTA on mobile because it's below the fold" is a problem. Test the problem, not a specific solution.
Ignoring sample size constraints. Most CRO advice is written by people at companies with millions of monthly visitors. If you're at a company with 50K monthly visitors per brand, you need a fundamentally different approach to test design and prioritization. Atticus Li's PRISM Method was built for this reality.
Not tying tests to revenue. If you can't tell your CFO how much a winning test is worth in annual revenue, your experimentation program will always be fighting for budget. The Revenue Rank step exists specifically to solve this problem.
Running tests in isolation. Every test should build on previous learnings. The Multiply phase ensures that insights compound across brands and test cycles, instead of being one-off wins that nobody remembers six months later.
Treating experimentation as a feature of a tool. Optimizely, VWO, AB Tasty — they're all fine tools. But the tool isn't the program. The program is the people, the process, and the prioritization framework. Atticus Li's PRISM Method works regardless of which testing tool you use.
Applying PRISM to Your Team
You don't need to be at a Fortune 500 company to use the PRISM Method. The framework scales down as well as it scales up. If you're running 10 tests a year, PRISM helps you make sure those 10 tests are the highest-impact tests you could possibly run.
Start with the Probe phase. Invest in behavioral analytics — even free tools like Microsoft Clarity give you heat maps and session replays. Understand the problem before you test a solution.
Then add the Revenue Rank step. Even rough revenue projections change how you prioritize. "This page gets 10x the traffic of that page, and each conversion is worth $500" is enough to make smarter decisions.
The rest follows naturally. Good implementation practices, rigorous scoring, and systematic multiplication of winning insights will emerge once the foundation is solid.
If you want to see the PRISM Method in action, read about how I applied it to NRG's experimentation program or explore the framework page for a visual overview.
Questions about applying the PRISM Method to your team? Reach out at [email protected].