If your team has more than a handful of testers, duplicates don’t show up as one obvious mistake. They show up as slow bleed, the same “new idea” getting shipped again with a slightly different headline, a different Jira ticket, and no memory of why it failed last time.

That’s why experiment naming conventions aren’t a nice-to-have. They’re operational safety rails. Done right, a name becomes a unique identifier, a quick summary, and a search key that helps your team avoid reruns and build on past learning.

This post gives you an enforceable naming standard, a duplicate-prevention workflow, and a simple repository schema you can roll out this quarter.

Why spreadsheets, Jira, Confluence, and Notion fail as experiment repositories

!Clean vector diagram for B2B SaaS showing transformation from messy sources like spreadsheets, Jira, Confluence, and Notion to a centralized Experiment Library / A/B Test Repository, highlighting issues like lost context and duplicates.

These tools are fine for work-in-progress, but they break as a long-term memory system.

Spreadsheets fail because structure drifts. One tester adds “Primary metric,” another adds “KPI,” a third adds a free-text “Success.” Filters break, columns get repurposed, and you can’t reliably search for “pricing page experiments that impacted trial starts.” Context gets separated into other docs, then links rot.

Jira fails because it’s optimized for tasks, not knowledge. Tickets get closed, renamed, moved across projects, and buried. You can’t synthesize learning across quarters because the “why” lives in comments, screenshots, and Slack threads, not in consistent fields. Duplicate tests happen because people search by ticket title, not by intent and pattern.

Confluence fails because pages sprawl. Everyone writes a doc differently, pages get copied, and updates rarely happen after the test ends. The result is tribal knowledge, teams remember the loud experiments, not the representative ones. You also get reruns of failed ideas because results aren’t standardized or easy to scan.

Notion fails for similar reasons. It’s flexible, which becomes the problem at scale. Without strict templates and governance, you end up with inconsistent documentation and weak retrieval. You can store pages, but you can’t reliably compare experiments, roll up patterns, or build a clean decision log.

Naming is the first place this breaks. If events and experiments don’t have consistent names, analytics and search go sideways, a point echoed in Heap’s discussion of naming conventions in analytics.

A naming convention you can enforce (and actually use to prevent duplicates)

Most teams name tests like “Homepage headline test v2.” That’s not a name, it’s a shrug. Your standard should do three jobs: identify, classify, and help search.

The format (required components)

Use a single, human-readable “Experiment Name” plus a stable “Experiment Key” in your testing tool. The key is what systems track, the name is what humans scan. If you want a clear definition of a key, see Statsig’s explanation of an experiment key.

Experiment Name format (kebab-case):

team-product-platform-funnel-surface-pattern-hypothesis-slug-yyyymm-##

Required components:

  • team: short team or squad (growth, checkout, activation)
  • product: app area or product line (core, billing, marketplace)
  • platform: web, ios, android, email
  • funnel: acq, act, rev, ret (keep a fixed set)
  • surface: where it shows (pricing, signup, checkout, onboarding-step2)
  • pattern: UX or offer pattern (cta-copy, form-short, social-proof, discount)
  • hypothesis-slug: 3 to 5 words max (what change should do)
  • yyyymm: month created (202601)
  • ##: sequence number for that month and surface (01, 02)

Character rules (non-negotiable)

  • Lowercase letters, numbers, and hyphens only
  • No spaces, underscores, emojis, or punctuation
  • Keep the whole name under 90 characters
  • Don’t include “ab,” “test,” “control,” “variant-a,” or tool names
  • If it’s a rerun, add a reason in metadata, not “v3” in the name

Examples and anti-examples

If you adopt only one discipline, make it this: surface + pattern must be present. That pair is what catches most duplicates.

!A professional vector-style circular flywheel diagram depicting the SaaS experimentation operations cycle, including stages like hypothesis, test design, results, repository entry, AI synthesis, and back to better hypotheses.

A duplicate-prevention workflow that holds up under pressure

A naming convention reduces duplicates, but it won’t stop them alone. You need a gate that runs before design and build.

Step 1: Intake search (mandatory, logged)

Before an experiment gets sized, the requester must search the repository by:

  • surface (pricing, checkout, onboarding)
  • pattern (cta-copy, form-short, guarantee)
  • primary metric (trial-start, purchase, activation-rate)

If the search isn’t attached, the experiment doesn’t get scheduled.

Step 2: Similarity check (human plus rules)

Assign an “experiment librarian” role weekly (rotating is fine). They do a 5-minute similarity pass:

  • Same surface + pattern within 18 months? Treat as a likely duplicate.
  • Same hypothesis intent, different UI? Still “related,” require linking.
  • Same segment but different platform? Allowed, but must reference prior results.

Step 3: Decision log (what you decided, and why)

Every “duplicate” becomes one of three decisions:

  • Merge: combine with existing planned work
  • Repeat with constraint: new segment, new promise, or new traffic source, clearly stated
  • Abort: record why, and what would need to change to revisit

This is where a dedicated experiment library earns its keep. A searchable repository like Growth Layer’s Testing Command Center is built for retrieval and linking, not just storing docs.

A/B test documentation that compounds learning (plus a library schema and AI support)

Good documentation isn’t long. It’s consistent, comparable, and easy to reuse.

A/B test documentation best practices (keep it tight)

  • Hypothesis with direction: “If we add X, primary metric will increase because Y.”
  • One primary metric plus 2 to 4 guardrails (latency, refund rate, churn, CS tickets).
  • Target segment and exposure rules: who sees it, when, and exclusions.
  • Design notes: what changed, what didn’t (avoid hidden scope creep).
  • Decision: ship, iterate, or stop, plus a one-line reason.
  • Learning statement: what you now believe, even if the result is flat.

Also be clear about the test type. People mix terms, but setups differ across stacks, a useful distinction in A/B versus split testing explained.

Recommended experiment library schema (fields and tags)

How AI changes experimentation ops (and what to watch)

!Clean, professional vector-style diagram for B2B SaaS experimentation operations, featuring a central Experiment Knowledge Base hub, AI layer with auto-tagging, theme clustering, and outputs like playbooks and reusable learnings.

AI makes repositories more than storage. With clean names and fields, you can auto-tag experiments, classify funnel stage, retrieve similar tests, and synthesize themes across quarters.

The cautions are operational, not theoretical:

  • Data hygiene: garbage names and missing fields produce confident nonsense.
  • Taxonomy governance: if “activation” means five different things, AI clustering won’t help.
  • Review loop: treat AI suggestions as drafts, require a human owner to confirm tags and links.

Conclusion

Duplicates are rarely a people problem, they’re a systems problem. With enforceable experiment naming conventions, a simple pre-flight search workflow, and a consistent library schema, teams over 5 testers stop rerunning the past and start compounding learning. Pick the standard, publish it, and make the intake gate real. The first month feels strict, the second month feels like relief.

Share this article
LinkedIn (opens in new tab) X / Twitter (opens in new tab)
Atticus Li

Experimentation and growth leader. Builds AI-powered tools, runs conversion programs, and writes about economics, behavioral science, and shipping faster.