If a new PM asks, “Have we tested trust badges in checkout?”, the answer shouldn’t be a 30-minute Slack archaeology session. It should be a quick search, a clear summary, and links to the original assets, data, and decision.
That’s what an experiment library taxonomy is for. It turns messy, one-off experiment notes into a living A/B test repository that compounds learning. When it’s done right, you can find relevant prior tests in under 10 seconds, even across teams and years.
This post lays out an operational tagging system, a practical checklist, and a retrieval playbook that fits real CRO work, not idealized process diagrams.
Why experiment repositories die in spreadsheets, Jira, Confluence, and Notion
Most teams start with good intentions: a spreadsheet tab, a Jira template, a Confluence page tree, a Notion database. Then the repository fails slowly, and predictably.
Common failure modes show up within a quarter:
- Lost context: Results get recorded, but the “why” disappears. No screenshots of variants, no targeting rules, no traffic anomalies, no decision log.
- Inconsistent documentation: One person writes a novel, another writes “won +3%”. Fields drift, naming conventions drift, and search becomes useless.
- Duplicates and reruns: A team re-tests a change that failed last year, because nobody can find the old test, or they find it but can’t trust it.
- Tribal knowledge wins: The most tenured IC becomes the real database. When they leave, the experiment knowledge base resets.
- No synthesis: Tests live as isolated rows, not as patterns (what tends to work in checkout for new users on mobile?).
An experiment library isn’t just storage, it’s retrieval plus meaning. The moment your A/B test repository can’t answer basic questions fast, people stop using it, and the system decays.
If you want a reference point for what a robust repository can look like in practice, Conversion has shared how they think about an experiment repository as a competitive asset. The key takeaway is simple: the tagging and structure matter as much as the results.
A taxonomy is the difference between “we have docs” and “we have institutional memory.”
A practical experiment library taxonomy CRO teams can adopt this week
A useful taxonomy has two goals that compete with each other: it must be strict enough to make search reliable, and light enough that teams will actually fill it out. The trick is separating “required fields” (few, consistent, enforced) from “recommended tags” (helpful, flexible).
It also helps to borrow a mindset from analytics governance: plan a small set of stable properties, then expand. Amplitude’s guidance on planning a taxonomy maps well to experimentation: define the minimum common language first.
Taxonomy checklist (required fields)
Recommended tags (the “10-second search” layer)
Use tags as chips that support filtering and similarity. Keep them short and from controlled lists where possible.
- UX pattern: social proof, pricing, form simplification, navigation, reassurance copy
- Hypothesis theme: trust, friction, clarity, urgency, value prop
- Segment tags: new vs returning, device, geo, paid vs organic
- Technical notes: latency risk, tracking risk, personalization, eligibility edge cases
- Research input: session replay, survey insight, support tickets, user testing
This structure works as an experimentation hub because it stays stable as volume grows. You can add tags later without breaking old records, but you can’t retrofit missing required fields at scale.
Retrieval playbook: finding similar experiments in under 10 seconds (and preventing reruns)
Here’s a scenario most teams recognize.
A growth team wants to improve checkout conversion. Someone proposes adding “secure checkout” badges, plus a short reassurance line under the credit card field. It feels safe. It ships into the experiment queue.
Two weeks later, the test is flat. Engineering time is burned, design time is burned, and the roadmap took a hit.
After the fact, a senior analyst finds an old Confluence page from 18 months ago. Same idea, same placement, same segment, also flat. The only reason nobody knew is that the old test was titled “Payment step trust experiment v2” and stored under a different squad space, with no consistent tags and no screenshots.
A centralized experiment knowledge base prevents this in two ways:
- Tag filtering gets you to the neighborhood fast. Funnel Stage = Checkout, Hypothesis Theme = Trust, UX Pattern = Reassurance, Segment = Mobile, Outcome = Loss/Inconclusive.
- AI similarity search gets you to “near-duplicates.” Even if the title is different, the system can match on hypothesis text, page location, and variant descriptions.
Tagging as a first-class feature is becoming table stakes across experimentation systems. LaunchDarkly’s update on tags for experiments reflects the same operational truth: organization needs consistent labels, or scaling breaks.
The 10-second retrieval workflow (repeatable)
- Start with 2 filters: Funnel stage + surface area (example: Checkout + Payment step).
- Add 1 theme tag: Trust, Friction, Clarity, Value prop.
- Scan outcomes first: sort by Loss and Inconclusive to avoid repeats, then scan Wins for patterns.
- Open the top 2–3 matches: confirm placement and audience, then check screenshots and decision notes.
- Use similarity suggestions: pull in adjacent tests (different copy, different placement) to avoid narrow thinking.
- Write the new hypothesis with citations: link back to prior tests to show what’s changing and why.
If you’re moving off transitional tools, an operational system like the Searchable A/B Testing Knowledge Base can act as a dedicated experimentation center of excellence artifact, with structured fields, tags, and AI-assisted retrieval.
The payoff isn’t just “better documentation.” It’s fewer duplicate tests, faster planning, and a library that gets more valuable every quarter.
Conclusion
An experiment library taxonomy is how CRO teams turn scattered test notes into an A/B test repository you can trust. Define a small set of required fields, add tags that match how people actually search, and make retrieval a default step in planning.
When search takes under 10 seconds, teams stop rerunning old failures and start building on what they already know. That’s how institutional memory forms, and how an AI experimentation system becomes more than a storage bin.