The A/B Testing Checklist: 27 Things to Verify Before You Launch

Atticus Li

← Blog · ab-testing

The A/B Testing Checklist: 27 Things to Verify Before You Launch

The 27-item pre-launch A/B test checklist that catches the silent killers — bad targeting, broken events, sample ratio mismatches — plus a pricing-test launch and measurement spec for experiments where revenue per visitor is the real metric.

By Atticus Li April 7, 2026 9 min read

Why You Need a Pre-Launch Checklist

A/B tests fail silently. Unlike a broken feature that triggers error alerts, a poorly configured test runs for weeks, consumes traffic, and produces results that look valid but are not. By the time you discover the problem, the traffic is spent and the time is gone.

A pre-launch checklist catches these problems before they cost you anything. Every experienced testing team has one. Here are the twenty-seven items yours should include.

Strategy and Hypothesis

1. The hypothesis is written down

Not in someone's head. Written, shared, and agreed upon before the test launches. Format: "If we [change], then [metric] will [direction] because [behavioral reason]."

2. The primary metric is defined

One metric determines the ship-or-revert decision. It is specific, measurable, and agreed upon by all stakeholders. Secondary metrics are listed separately.

3. The success criteria are pre-defined

What result counts as a win? A statistically significant improvement in the primary metric at ninety-five percent confidence is standard. Define this before launching, not after reviewing results.

4. The business case is clear

Why does this test matter? What is the potential impact if the variant wins? This ensures you are spending testing traffic on decisions that actually move the business.

5. The test is not redundant

Check your testing log. Has this hypothesis been tested before? If so, what has changed that justifies retesting? Running the same test twice without new reasoning wastes resources.

Statistical Design

6. Sample size is calculated

Use a sample size calculator with your baseline conversion rate, minimum detectable effect, significance level, and power. Document the required visitors per variant.

7. Test duration is estimated

Divide required sample size by daily traffic to the test page. Add days to round up to complete business cycles (minimum one full week).

8. The MDE is realistic

Is the minimum detectable effect achievable given the change you are making? A minor copy tweak is unlikely to produce a large relative improvement. Mismatched MDE expectations lead to inconclusive tests.

9. Traffic allocation is correct

Fifty-fifty split for a standard two-variant test. If using an unequal split, understand the impact on test duration and document the rationale.

10. No other tests conflict

Check for concurrent tests on the same page or targeting the same users. Overlapping tests can interact and contaminate each other's results. Use mutually exclusive traffic allocation or schedule tests sequentially.

Variant Quality

11. The variant matches the hypothesis

Does the change you built actually test the hypothesis you wrote? It is surprisingly common for the implemented variant to drift from the original hypothesis during development.

12. Only the intended elements are changed

Compare the control and variant carefully. Unintended differences (a missing image, a shifted layout, a different font size) introduce confounds that make results uninterpretable.

13. The variant works on all devices

Test on desktop, tablet, and mobile. Test on the major browsers your audience uses. A variant that is broken on mobile will lose — but that loss measures a technical bug, not your hypothesis.

14. The variant loads correctly

No flicker, no delayed loading, no content shift. Client-side testing tools can cause the original version to flash before the variant renders. This flicker itself affects user behavior and contaminates results.

15. The variant handles edge cases

What happens when the user has a slow connection? When JavaScript fails to load? When the browser window is an unusual size? When the user has an ad blocker? Edge cases that crash the variant invalidate results.

Tracking and Analytics

16. Conversion tracking is implemented correctly

Verify that the primary metric is being tracked accurately for both the control and variant. Test the tracking by completing the conversion action yourself in both versions.

17. Revenue tracking is accurate (if applicable)

If your primary or secondary metric involves revenue, confirm that revenue values are being recorded correctly. Off-by-one errors in decimal places or currency conversions silently corrupt revenue data.

18. Segment data is being captured

If you plan to analyze results by segment (device type, traffic source, new vs returning visitors), verify that segment data is being recorded alongside conversion data.

19. The analytics tool and testing tool agree

Compare visitor counts and conversion counts between your testing platform and your analytics platform. Significant discrepancies indicate a tracking problem.

20. Bot traffic is filtered

Bots do not convert but they inflate visitor counts, which dilutes your conversion rate and extends test duration. Ensure your testing platform filters known bot traffic.

Operational Readiness

21. Stakeholders know the test is running

Anyone who might modify the test page, change pricing, or launch a campaign that affects the test audience should know a test is active. Unannounced changes during a test invalidate results.

22. The test end date is on the calendar

Put the expected completion date on a shared calendar. Assign someone to analyze results on that date. Tests without a scheduled end date tend to run indefinitely or get forgotten.

23. Emergency stop criteria are defined

What would cause you to stop the test early? Define specific thresholds for technical errors, severe conversion drops, or user complaints. Anything below those thresholds, the test continues.

24. The implementation plan is ready

If the variant wins, can you implement it permanently? Identify any technical debt or additional work required to make the variant the new default. Winning tests that never get implemented are wasted effort.

Documentation

25. The test is logged in your testing repository

Record the hypothesis, variant description, metrics, sample size, expected duration, and launch date. This log is your institutional memory of what has been tested and learned.

26. Screenshots of both versions are saved

Capture the control and variant as they appear on launch day. Testing tools and page designs change over time. Without screenshots, you lose the visual record of what was actually tested.

27. The analysis plan is written

Decide before launching how you will analyze results. Which statistical test? Which segments? What secondary metrics? Pre-registration of the analysis plan prevents unconscious cherry-picking after results are in.

Using This Checklist in Practice

You do not need to check all twenty-seven items for every test. The list is comprehensive by design so you can tailor it to your context.

For a quick headline test, items one through four, six through seven, eleven through fourteen, sixteen, and twenty-two might be sufficient.

For a major redesign test with significant business impact, every item on this list is relevant.

The value of the checklist is not in its completeness but in its forcing function. It makes you slow down at the one moment when slowing down prevents the most waste: before you commit traffic to a test.

The Meta-Lesson

Experimentation is not just about what you test. It is about how rigorously you test it. A sloppy test that produces a "winner" is worse than no test at all, because it gives you false confidence in a decision that might be wrong.

The checklist is your quality gate. Use it every time, and the results you get will be results you can trust.

FAQ

How long does it take to go through this checklist?

For a straightforward test with good tooling, thirty minutes to an hour. For a complex test with custom tracking, half a day. The time is always worth it compared to the cost of a wasted test.

Who should be responsible for the checklist?

The person who designs the test should complete the checklist, but a second person should review it. Fresh eyes catch assumptions and oversights that the test designer is too close to see.

What if I discover a problem mid-test?

If the problem affects data quality (broken tracking, unintended variant changes), stop the test, fix the issue, and restart with fresh data. Do not patch a running test and combine the before-and-after data.

Should I automate parts of this checklist?

Yes. Technical checks like cross-browser testing, tracking verification, and bot filtering can be automated. Strategic checks like hypothesis quality and MDE realism require human judgment.

Can I skip the checklist for small tests?

Every test consumes traffic, which is a finite resource. Even small tests should pass the minimum checks: hypothesis written, primary metric defined, sample size calculated, tracking verified. Skip the operational items for low-stakes tests if needed.

Pricing A/B Launch Checklist and Measurement Spec

Pricing experiments fail differently than feature experiments. The verifications below address the failure modes specific to launching a pricing test, in addition to the standard 27 items above. Pricing tests are the experiments most likely to look like winners and turn out to be losers — the verification overhead is worth it.

Measurement Spec for a Pricing Test

Revenue per visitor is the primary metric, not conversion rate. A ten percent conversion lift at a thirty percent discount is a revenue loss. There is a clear definition of when revenue is recognized: on signup, on first payment, or on Nth-day retention. Tracking captures plan tier and billing cycle as separate dimensions — "subscribed" without those cannot be analyzed. The analysis includes a refund and chargeback window. Pricing tests routinely look like winners until refunds catch up at day thirty.

Eligibility and Cohort

Existing customers are excluded — new pricing should never appear to anyone with an active subscription. Recently churned customers are flagged separately, because they often re-engage at lower prices and can mask whether the test moved net new revenue. Geographic eligibility matches the tax and regulatory setup; some pricing tests cannot legally run in EU markets without legal review. Currency conversion logic is verified end-to-end before launch, not after the first international purchase.

Operational Readiness

Customer support knows the test is running and has the variant assignment lookup tool. Billing systems handle multiple price points concurrently without proration bugs. The kill switch reverts to the existing price (not the control variant) in case of payment provider errors. A communication plan exists for grandfathering decisions if the pricing test wins, written before launch — not improvised after the fact.

ab-testing experimentation quality assurance testing checklist testing process

Atticus Li

Experimentation and growth leader. CXL-certified CRO practitioner, Mindworx-certified behavioral economist (1 of ~1,000 worldwide). 200+ A/B tests across energy, SaaS, fintech, e-commerce, and marketplace verticals.

About LinkedIn Newsletter

Why You Need a Pre-Launch Checklist

Strategy and Hypothesis

1. The hypothesis is written down

2. The primary metric is defined

3. The success criteria are pre-defined

4. The business case is clear

5. The test is not redundant

Statistical Design

6. Sample size is calculated

7. Test duration is estimated

8. The MDE is realistic

9. Traffic allocation is correct

10. No other tests conflict

Variant Quality

11. The variant matches the hypothesis

12. Only the intended elements are changed

13. The variant works on all devices

14. The variant loads correctly

15. The variant handles edge cases

Tracking and Analytics

16. Conversion tracking is implemented correctly

17. Revenue tracking is accurate (if applicable)

18. Segment data is being captured

19. The analytics tool and testing tool agree

20. Bot traffic is filtered

Operational Readiness

21. Stakeholders know the test is running

22. The test end date is on the calendar

23. Emergency stop criteria are defined

24. The implementation plan is ready

Documentation

25. The test is logged in your testing repository

26. Screenshots of both versions are saved

27. The analysis plan is written

Using This Checklist in Practice

The Meta-Lesson

FAQ

How long does it take to go through this checklist?

Who should be responsible for the checklist?

What if I discover a problem mid-test?

Should I automate parts of this checklist?

Can I skip the checklist for small tests?

Pricing A/B Launch Checklist and Measurement Spec

Measurement Spec for a Pricing Test

Eligibility and Cohort

Operational Readiness

Related Articles

Activation Metrics: How to Pick the One That Predicts Retention

How to Write A/B Test Hypotheses That Actually Hold Up

The Commitment Trap: Why Forcing Users to Opt-In Destroys Conversions (and What Loss Aversion Actually Predicts)

Related Articles

Activation Metrics: How to Pick the One That Predicts Retention

How to Write A/B Test Hypotheses That Actually Hold Up

The Commitment Trap: Why Forcing Users to Opt-In Destroys Conversions (and What Loss Aversion Actually Predicts)

Three places this work shows up.

GrowthLayer

Consulting

Jobsolv

Get the WeeklyExperimentation Playbook

Get the Weekly
Experimentation Playbook