Why You Need a Pre-Launch Checklist
A/B tests fail silently. Unlike a broken feature that triggers error alerts, a poorly configured test runs for weeks, consumes traffic, and produces results that look valid but are not. By the time you discover the problem, the traffic is spent and the time is gone.
A pre-launch checklist catches these problems before they cost you anything. Every experienced testing team has one. Here are the twenty-seven items yours should include.
Strategy and Hypothesis
1. The hypothesis is written down
Not in someone's head. Written, shared, and agreed upon before the test launches. Format: "If we [change], then [metric] will [direction] because [behavioral reason]."
2. The primary metric is defined
One metric determines the ship-or-revert decision. It is specific, measurable, and agreed upon by all stakeholders. Secondary metrics are listed separately.
3. The success criteria are pre-defined
What result counts as a win? A statistically significant improvement in the primary metric at ninety-five percent confidence is standard. Define this before launching, not after reviewing results.
4. The business case is clear
Why does this test matter? What is the potential impact if the variant wins? This ensures you are spending testing traffic on decisions that actually move the business.
5. The test is not redundant
Check your testing log. Has this hypothesis been tested before? If so, what has changed that justifies retesting? Running the same test twice without new reasoning wastes resources.
Statistical Design
6. Sample size is calculated
Use a sample size calculator with your baseline conversion rate, minimum detectable effect, significance level, and power. Document the required visitors per variant.
7. Test duration is estimated
Divide required sample size by daily traffic to the test page. Add days to round up to complete business cycles (minimum one full week).
8. The MDE is realistic
Is the minimum detectable effect achievable given the change you are making? A minor copy tweak is unlikely to produce a large relative improvement. Mismatched MDE expectations lead to inconclusive tests.
9. Traffic allocation is correct
Fifty-fifty split for a standard two-variant test. If using an unequal split, understand the impact on test duration and document the rationale.
10. No other tests conflict
Check for concurrent tests on the same page or targeting the same users. Overlapping tests can interact and contaminate each other's results. Use mutually exclusive traffic allocation or schedule tests sequentially.
Variant Quality
11. The variant matches the hypothesis
Does the change you built actually test the hypothesis you wrote? It is surprisingly common for the implemented variant to drift from the original hypothesis during development.
12. Only the intended elements are changed
Compare the control and variant carefully. Unintended differences (a missing image, a shifted layout, a different font size) introduce confounds that make results uninterpretable.
13. The variant works on all devices
Test on desktop, tablet, and mobile. Test on the major browsers your audience uses. A variant that is broken on mobile will lose — but that loss measures a technical bug, not your hypothesis.
14. The variant loads correctly
No flicker, no delayed loading, no content shift. Client-side testing tools can cause the original version to flash before the variant renders. This flicker itself affects user behavior and contaminates results.
15. The variant handles edge cases
What happens when the user has a slow connection? When JavaScript fails to load? When the browser window is an unusual size? When the user has an ad blocker? Edge cases that crash the variant invalidate results.
Tracking and Analytics
16. Conversion tracking is implemented correctly
Verify that the primary metric is being tracked accurately for both the control and variant. Test the tracking by completing the conversion action yourself in both versions.
17. Revenue tracking is accurate (if applicable)
If your primary or secondary metric involves revenue, confirm that revenue values are being recorded correctly. Off-by-one errors in decimal places or currency conversions silently corrupt revenue data.
18. Segment data is being captured
If you plan to analyze results by segment (device type, traffic source, new vs returning visitors), verify that segment data is being recorded alongside conversion data.
19. The analytics tool and testing tool agree
Compare visitor counts and conversion counts between your testing platform and your analytics platform. Significant discrepancies indicate a tracking problem.
20. Bot traffic is filtered
Bots do not convert but they inflate visitor counts, which dilutes your conversion rate and extends test duration. Ensure your testing platform filters known bot traffic.
Operational Readiness
21. Stakeholders know the test is running
Anyone who might modify the test page, change pricing, or launch a campaign that affects the test audience should know a test is active. Unannounced changes during a test invalidate results.
22. The test end date is on the calendar
Put the expected completion date on a shared calendar. Assign someone to analyze results on that date. Tests without a scheduled end date tend to run indefinitely or get forgotten.
23. Emergency stop criteria are defined
What would cause you to stop the test early? Define specific thresholds for technical errors, severe conversion drops, or user complaints. Anything below those thresholds, the test continues.
24. The implementation plan is ready
If the variant wins, can you implement it permanently? Identify any technical debt or additional work required to make the variant the new default. Winning tests that never get implemented are wasted effort.
Documentation
25. The test is logged in your testing repository
Record the hypothesis, variant description, metrics, sample size, expected duration, and launch date. This log is your institutional memory of what has been tested and learned.
26. Screenshots of both versions are saved
Capture the control and variant as they appear on launch day. Testing tools and page designs change over time. Without screenshots, you lose the visual record of what was actually tested.
27. The analysis plan is written
Decide before launching how you will analyze results. Which statistical test? Which segments? What secondary metrics? Pre-registration of the analysis plan prevents unconscious cherry-picking after results are in.
Using This Checklist in Practice
You do not need to check all twenty-seven items for every test. The list is comprehensive by design so you can tailor it to your context.
For a quick headline test, items one through four, six through seven, eleven through fourteen, sixteen, and twenty-two might be sufficient.
For a major redesign test with significant business impact, every item on this list is relevant.
The value of the checklist is not in its completeness but in its forcing function. It makes you slow down at the one moment when slowing down prevents the most waste: before you commit traffic to a test.
The Meta-Lesson
Experimentation is not just about what you test. It is about how rigorously you test it. A sloppy test that produces a "winner" is worse than no test at all, because it gives you false confidence in a decision that might be wrong.
The checklist is your quality gate. Use it every time, and the results you get will be results you can trust.
FAQ
How long does it take to go through this checklist?
For a straightforward test with good tooling, thirty minutes to an hour. For a complex test with custom tracking, half a day. The time is always worth it compared to the cost of a wasted test.
Who should be responsible for the checklist?
The person who designs the test should complete the checklist, but a second person should review it. Fresh eyes catch assumptions and oversights that the test designer is too close to see.
What if I discover a problem mid-test?
If the problem affects data quality (broken tracking, unintended variant changes), stop the test, fix the issue, and restart with fresh data. Do not patch a running test and combine the before-and-after data.
Should I automate parts of this checklist?
Yes. Technical checks like cross-browser testing, tracking verification, and bot filtering can be automated. Strategic checks like hypothesis quality and MDE realism require human judgment.
Can I skip the checklist for small tests?
Every test consumes traffic, which is a finite resource. Even small tests should pass the minimum checks: hypothesis written, primary metric defined, sample size calculated, tracking verified. Skip the operational items for low-stakes tests if needed.