If your experimentation program feels busy but not productive, the problem often isn’t idea volume. It’s flow. Tests get created, half-built, re-prioritized, and then quietly die in a backlog, a spreadsheet tab, or someone’s memory.

A well-run A/B test repository fixes that by treating experiments like a system with clear states, owners, and exit criteria. When you can see where every test sits (intake, running, analysis, shipped, archived), you can also see what’s blocked and why.

This post outlines a practical workflow state model and the governance that keeps tests moving, prevents duplicates, and turns your experiment library into compounding institutional memory.

Why spreadsheets, Jira, and Notion create “stuck test” gravity

!A clean, professional vector diagram highlighting failure modes of experiments in Spreadsheets, Jira, Confluence, and Notion, with an arrow pointing to a Centralized A/B Test Repository or Experiment Knowledge Base.

Most teams start with transitional tools: a spreadsheet for the backlog, Jira for build tasks, Confluence for write-ups, Notion for notes. That setup works while the team is small and turnover is low.

Then the cracks show up:

A spreadsheet captures “what,” but not the “why.” Jira captures “done,” but not the result. Confluence captures the story, but it’s hard to query across 200 pages. Notion captures everything, but not in a consistent schema. Over time, experimentation turns into tribal knowledge, and tribal knowledge doesn’t scale.

This is where an experiment library becomes an operational need, not a documentation hobby. It’s a central experiment knowledge base with the fields you’ll later wish you had: hypothesis, primary metric, guardrail metrics, audience, variants, implementation notes, analysis approach, decision, and follow-ups.

If you’re building this as an experimentation center of excellence, the goal is simple: every test should be easy to find, easy to understand, and hard to repeat by accident. For general guidance on setting hypotheses, duration, and checklists, it’s worth aligning your team on a shared baseline like PostHog’s A/B testing best practices.

A practical “next step” when you outgrow your transitional tools is a dedicated experimentation hub such as the Searchable A/B Test Repository, where workflow states and consistent fields make your history usable across teams.

The workflow states that keep experiments moving (and accountable)

!Clean B2B SaaS vector diagram showing left-to-right workflow states from Intake to Archived, with guardrails like owner due dates and auto reminders, plus a feedback loop from Analysis to Running.

Workflow states work because they force clarity. “In progress” is vague. “Designed, waiting on QA sign-off” is actionable.

A clean state model for an A/B test repository looks like this:

  • Intake: ideas enter the system with an owner and a due date for the first draft.
  • Prioritized: the test has a score or rationale, plus entry criteria met (hypothesis, metric, target surface area).
  • Designed: spec is complete (variants, tracking plan, segmentation, QA plan).
  • Running: experiment is live, monitoring is scheduled, and automated reminders prevent “set and forget.”
  • Analysis: the run is complete, analysis is assigned, and decision logging is required.
  • Shipped: winning changes are rolled out, or learnings are translated into next actions.
  • Archived: everything is packaged for retrieval, including what you’d do differently next time.

The point isn’t ceremony. It’s removing ambiguity so nothing stalls without showing up as “blocked.”

A simple way to operationalize this is to define entry and exit criteria per state, and attach SLAs to the handoffs:

A key guardrail is a formal “Needs more data” loop from Analysis back to Running. Without that, teams quietly extend tests, then forget why they extended them.

For debugging issues that can keep tests from reaching clean conclusions (assignment, event counts, feature-flag conflicts), keep a shared reference like PostHog’s experiment troubleshooting guide linked in your analysis checklist.

Prevent duplicates, improve retrieval, and make wins compound over time

!Clean B2B SaaS-style vector diagram of a circular flywheel process for compounding learnings in experimentation, featuring steps like Document, AI Tag, Retrieve, Synthesize, Ship variants, and generate more data.

Duplicate tests are rarely exact repeats. They’re “same idea, new words.” That’s why preventing duplicates is a workflow step, not a reminder in someone’s head.

Add a lightweight “similarity check” before anything leaves Prioritized:

  1. The owner searches the experiment library for the top 3 keywords (surface area, intent, mechanism).
  2. The owner filters by segment and metric (for example, “new users” + “activation rate”).
  3. The owner scans summaries of the closest 3 to 5 experiments.
  4. The owner logs one of three outcomes: new, adaptation, or repeat with new conditions.

An AI experimentation system makes this faster by auto-tagging new entries (surface area, audience, metric type, mechanism) and suggesting “similar tests” as you type. The win is not automation, it’s recall. You get institutional memory at the moment you need it, during planning.

A failure story that shows the cost: a growth team once reran a “shorter checkout” experiment because it sounded obvious and the old results weren’t easy to find. It took two sprints, pulled engineering away from higher-impact work, and ended with the same null result. Later, someone found the original write-up buried in a personal Notion page. The missing detail was the killer: the earlier test had already shown that shipping costs, not form length, was the real driver, and the “short form” change didn’t address it.

Concrete prevention steps in an experiment knowledge base:

  • Decision log required in Analysis: what you chose and why, including confidence and caveats.
  • “What surprised us” field: the one insight a future team member can’t infer from charts.
  • Implementation notes: key constraints (traffic mix, pricing changes, seasonality, tracking gaps).
  • Follow-ups linked: if the result suggests a next test, connect them so the chain stays intact.

This is how learnings compound. Over time, you stop testing random ideas and start testing sharper variants based on patterns. Your win-rate improves because your inputs improve.

Conclusion

Stuck tests aren’t a mystery. They’re what happens when ownership is fuzzy, states are unclear, and decisions aren’t recorded where the next person will look.

A strong A/B test repository with explicit workflow states, SLAs, reminders, and decision logs turns experimentation into an operational system. The payoff is fewer duplicates, faster retrieval, and a compounding experiment library that keeps getting smarter as you run more tests.

Share this article
LinkedIn (opens in new tab) X / Twitter (opens in new tab)
Atticus Li

Experimentation and growth leader. Builds AI-powered tools, runs conversion programs, and writes about economics, behavioral science, and shipping faster.