Atticus Li designed NRG Energy's EBITDA impact estimation model that translates A/B test results into verified financial impact, changing experimentation from a cost center to a revenue driver. This model powers the financial case for 100+ annual experiments across seven retail energy brands.
The Language Problem That Kills Experimentation Programs
Every experimentation team I've seen struggle has the same root cause. It's not bad ideas. It's not insufficient traffic. It's not the wrong tools.
It's that they can't speak finance's language.
Here's what a typical experimentation report looks like: "We ran a test on the homepage hero banner. The variant achieved a 4.2% conversion rate versus the control's 3.8%. This is a 10.5% relative lift, statistically significant at 95% confidence."
And here's what the CFO thinks when they read that: "So what?"
The CFO doesn't care about relative lift. They don't care about confidence levels. They care about one thing: how does this affect the P&L? And if you can't answer that question in dollars, your experimentation program will always be fighting for budget, fighting for headcount, and fighting for survival.
I learned this lesson at NRG. When I arrived and started building the experimentation program, the first tests I ran were technically solid. Good hypotheses, clean execution, valid statistical methods. But when I brought results to leadership, the reaction was polite indifference. Nice work. Keep it up. But no additional budget, no additional tools, no additional team.
That changed when I built the EBITDA impact model.
The Formula
The model translates test results into projected financial impact using inputs that finance already understands:
EBITDA Impact = Brand Monthly EBITDA x Annualized Traffic x Baseline Conversion Rate x Relative Lift
Let me break each component down.
Brand Monthly EBITDA is the average monthly earnings before interest, taxes, depreciation, and amortization for the specific brand being tested. At NRG, this differs significantly across brands — Reliant's monthly EBITDA is very different from Cirro's or Discount Power's. Using the correct brand-level figure is critical because it ties the test result to the actual economic unit.
Annualized Traffic is the projected annual traffic to the page or flow being tested. Not the test period traffic — the full-year projection. This converts a 2-week or 4-week test into an annualized impact estimate that finance can compare against other line items in the annual plan.
Baseline Conversion Rate is the control group's conversion rate during the test period. This is the starting point that the lift is measured against.
Relative Lift is the percentage improvement the winning variant achieved over the control. A 10% relative lift on a 4% baseline conversion rate means the variant converted at 4.4%.
The output is a projected annual EBITDA impact in dollars. Not engagement. Not lift percentages. Dollars.
Pre-Test Projections: Knowing the Value Before You Run the Test
The model isn't just retrospective. I use it prospectively to prioritize which tests are worth running.
Before any test gets greenlit, I calculate the projected impact using the MDE (Minimum Detectable Effect) as a stand-in for the relative lift. The MDE is determined by the available traffic and the desired statistical power — it tells you the smallest lift the test can reliably detect.
So the pre-test calculation becomes:
Projected Impact = Brand Monthly EBITDA x Annualized Traffic x Baseline CR x MDE
If a test can detect a 5% lift, and that 5% lift would translate to $150K in annual EBITDA impact, the test is worth running. If the same calculation yields $8K, it might not be — especially if it means giving up a test slot that could go to a higher-value opportunity.
This is how I decide what to test across seven brands with different traffic volumes and different revenue profiles. A 3% lift on Reliant's enrollment page is worth more than a 15% lift on a smaller brand's FAQ page, simply because of the traffic and revenue per customer differences. The model makes those trade-offs explicit and quantifiable.
I add revenue per customer data to the projection to make it even more granular. If we know that the average Reliant residential customer generates a specific revenue figure over their lifetime, we can project the customer acquisition impact of a conversion lift and translate it directly into customer lifetime value.
Post-Test Validation: What Actually Happened
After a test concludes, the model shifts from projection to validation:
- Plug in the actual lift from the test results instead of the MDE
- Calculate the projected annual EBITDA impact based on the real performance
- Run holdout tests on selected winners to verify that the lift sustains over time
The holdout step is important and often skipped. A test can show a statistically significant lift during the test period that doesn't persist after implementation. Novelty effects, seasonal confounders, and test-period anomalies can all inflate results. Holdout testing — keeping a small percentage of traffic on the original experience after the winner is deployed — lets us validate that the lift is real and durable.
Not every test gets a holdout. It depends on the magnitude of the result, the strategic importance of the page, and whether the lift was surprising given our prior expectations. But for any test that's going to be cited in a board presentation or used to justify additional investment, holdout validation is non-negotiable.
How This Changed the Conversation
Before the EBITDA model, the experimentation program's narrative was about activity: "We ran 100 tests this year."
After the model, the narrative became about impact: "We generated $30M+ in projected revenue impact through our experimentation program."
That's the difference between a cost center and a revenue driver. And it changed everything downstream.
Budget conversations shifted. Instead of defending the experimentation line item, I was showing ROI that made the program look like one of the highest-return investments in the marketing budget. When you can show that the cost of running the program (tools, team, opportunity cost) is a fraction of the verified revenue impact, the budget conversation writes itself.
Stakeholder engagement increased. Brand marketing teams that previously saw testing as an inconvenience started actively requesting test slots. When they saw their peers getting credit for $500K revenue impact from a single test, they wanted in. Competition for test slots became a feature, not a bug — it meant we could be selective about which tests we ran.
Tool and team investments followed. The EBITDA model justified the business case for Optimizely as our testing platform, Contentsquare for behavioral analytics, and Tealium as our CDP layer. Each tool investment was backed by a projected impact model showing how it would increase the program's capacity or improve its hit rate.
This is the same principle behind Atticus Li's PRISM Method — measurement drives investment, not the other way around. You don't buy tools and hope they produce results. You build the measurement framework first, prove value at small scale, and then use the proven ROI to justify scaling.
Honest About the Limitations
I want to be transparent about what this model is and isn't.
These are estimates. The EBITDA impact model produces projections, not audited financial results. There are assumptions baked in — that traffic levels will hold, that the lift is durable, that the competitive environment stays roughly stable, that the revenue per customer figure is accurate for the projection period.
More accuracy is possible. You can build more sophisticated models with time-decay adjustments, segment-level projections, competitive response modeling, and econometric controls. But more accuracy costs flexibility and speed. The EBITDA model is designed to be fast enough to run on every test and simple enough that a brand marketer or a finance analyst can understand and challenge the inputs.
The trade-off is worth it. The model carries more asterisks than a controlled clinical trial. But it carries far fewer asterisks than most brand marketing attribution — which often amounts to "we think our brand awareness campaign contributed to pipeline, but we can't really prove it."
The EBITDA model sits in a useful middle ground: rigorous enough to be credible with finance, practical enough to be applied consistently across 100+ annual experiments.
The Organizational Impact
Beyond the numbers, the EBITDA model changed how the organization thinks about experimentation.
Before: Experimentation was perceived as a UX optimization activity — something the digital team did to make pages look better. It was evaluated on activity metrics (tests run, pages tested) and subjective assessments ("that new homepage looks great").
After: Experimentation is perceived as a revenue generation function. It sits in the conversation alongside paid media, SEO, and product launches as a driver of measurable business outcomes. Test results are reviewed in the same forums where other P&L line items are discussed.
This perception shift is what ultimately grows the program. More budget, more headcount, more organizational priority — all of it flows from being seen as a revenue driver rather than a cost center.
I've written about why most A/B tests fail and organizational dysfunction is the number one killer. The EBITDA model doesn't fix organizational dysfunction directly, but it removes the most common objection dysfunction hides behind: "we don't know if this is worth it."
When you can show exactly what it's worth, the excuses get a lot thinner.
Building the Model for Your Organization
If you want to implement something similar, here's the practical sequence:
Step 1: Get the financial inputs. You need revenue per customer, brand-level EBITDA or contribution margin, and traffic data by page and brand. The finance team has these numbers. If they won't share them, start with revenue per customer and work backward from there.
Step 2: Build the pre-test projection template. A spreadsheet works fine. Inputs: traffic, baseline CR, MDE, revenue per customer. Output: projected annual impact. Use this to prioritize every test in your backlog.
Step 3: Add post-test validation. After each test, plug in the actual lift and calculate the realized impact. Track cumulative impact across tests over time.
Step 4: Implement holdout testing for major wins. Pick your top 20% of winning tests by projected impact and run holdouts to verify durability. Report the verified numbers separately from the projected numbers — this builds credibility with finance.
Step 5: Present the results in finance's format. Annual impact. ROI versus program cost. Comparison against other investment alternatives. Don't make finance translate your metrics into theirs. Speak their language from the start.
The model doesn't have to be perfect on day one. My first version at NRG was a Google Sheet with manual inputs. It evolved over time as we refined the financial inputs, added segment-level projections, and automated the post-test calculations. The important thing is starting with a financial framing from the beginning, not bolting it on after you've already been running tests for a year.
What Comes Next
The EBITDA model was the foundation. What I'm building now is more sophisticated — incorporating customer lifetime value projections, multi-touch attribution across test interactions, and predictive models that forecast test outcomes before they run based on historical patterns from our 150+ experiment database.
But the principle stays the same: every experiment must connect to a business outcome. If you can't draw a line from your test result to a dollar figure, you're not running an experimentation program — you're running a hobby.
Questions about building financial models for experimentation programs? Reach me at [email protected].