Ask three tools in the same company how many conversions happened last month and you'll get three different answers. The client-side analytics platform says one number, the server-side pipeline says a higher one, the BI warehouse says something in between. Everyone knows the numbers don't tie out. Almost nobody reconciles them — and in the vacuum, each team quietly cites whichever tool tells the story they need.

TL;DR

  • Your analytics tools disagree by design, not by accident. Client-side tracking, server-side pipelines, and BI warehouses count with different rules, so their numbers structurally diverge — often by double digits.
  • The discrepancy is well understood in theory and ignored in practice because reconciliation is expensive, boring, and owned by no one. So it doesn't happen.
  • In the vacuum, the "single source of truth" becomes political: whichever number the most senior person trusts, or whichever supports the argument being made, wins the meeting.
  • The fix isn't picking the "right" tool — none is fully right. It's documenting why they differ, choosing an authoritative source per decision type, and validating instrumentation before trusting any of them.
SourceSystematicallyBecause
Client-side (GA4-style)Under-countsAd blockers, consent rejection, privacy thresholding
Server-side pipelineCounts more, differentlyBypasses blockers; different dedup/timing rules
BI warehouseThird numberOwn join logic, retention windows, currency/timing

None of these is "the truth." Each is a defensible measurement under different rules — and the gap between them is where bad decisions hide.

Why the numbers can't agree

The disagreement isn't a bug to be fixed; it's a structural property of measuring the same events three different ways. Take client-side analytics as the baseline. It systematically under-counts, for reasons that are individually well documented:

  • Ad blockers prevent the tracking script from loading for a meaningful share of users — estimates commonly put client-side loss in the 15–30% range depending on audience and geography, and higher in privacy-conscious segments (GA4 ad-blocker impact).
  • Consent rejection blocks events for the large fraction of users — often a majority in some regions — who decline or ignore cookie consent.
  • Privacy thresholding and sampling. Platforms like GA4 will hide data entirely when a metric might reveal individual identities, and may sample large datasets, so the reported number is a processed approximation, not a raw count (auditing GA4 for data accuracy).

Server-side tracking bypasses ad blockers and consent-driven client loss, so it captures more — but it dedupes users differently, attributes over different windows, and times events differently. The BI warehouse then applies its own join logic, retention limits, and currency conversions and produces a third number. Each tool is internally correct under its own rules. There is no tool that is correct in some absolute sense, because "conversions last month" is not one well-defined quantity — it's three, depending on the counting rules.

There's an old measurement heuristic worth keeping in mind here — Twyman's law: any figure that looks interesting or different is usually wrong. When one of the three tools shows a number that's conveniently better than the others, Twyman's law says the first hypothesis should be "this tool's counting rules flatter this metric," not "this is the real number." The convenient number is the one to distrust first.

Reconciliation is nobody's job, so it doesn't happen

If the discrepancy is so well understood, why does it persist? Because reconciliation is expensive, unglamorous, and unowned. Working out exactly why the client-side number is 22% below the server-side number — decomposing it into ad-block loss, consent loss, dedup differences, and timing lag — is days of tedious analysis that produces no new feature and no new campaign. It's pure infrastructure work, and it sits in the gap between the analytics team, the data-engineering team, and the marketing team, each of whom reasonably considers it one of the others' problems.

So it doesn't get done. The organization runs on three numbers that don't tie out, everyone vaguely knows it, and the discrepancy gets managed socially instead of analytically. This is the same root cause that makes event tracking architecture the thing that quietly makes or breaks data quality — the decisions that determine whether your numbers are trustworthy are invisible, boring, and made once by whoever set up the pipeline, then never revisited.

An unreconciled discrepancy doesn't stay a technical footnote. It becomes a political instrument — because when numbers disagree and no one has decided which is authoritative, the disagreement gets resolved by rank, not by analysis.

The political failure mode: whichever number wins the meeting

Here's where the hidden cost lands. When three tools disagree and no source is designated authoritative for a given decision, the number that gets used is the one that supports the argument being made. A team that wants budget cites the server-side number that shows more conversions. A team defending against a cut cites the client-side number that shows the shortfall isn't their fault. The "single source of truth" becomes, in practice, whichever tool the most senior person in the room happens to trust.

This is corrosive in a specific way: it makes data feel rigorous while functioning as rhetoric. Everyone is citing a real number from a real tool, so every argument sounds evidence-based, but the evidence was selected to fit the conclusion. It's a close relative of the way vanity metrics substitute for revenue metrics — the number is technically true and decision-irrelevant, chosen because it flatters rather than because it's the right measure for the question. And it interacts badly with experimentation: if your tools disagree on baseline conversion, they can disagree on whether a test moved it, which is exactly the kind of instability that experimentation governance and SRM checks exist to surface before a result is trusted.

The fix: reconcile once, designate authority, validate instrumentation

You cannot make the three tools agree — they're measuring under different rules and always will. What you can do is remove the ambiguity that lets the discrepancy become political. Three moves:

  1. Do the reconciliation once, and document it. Decompose the gap between your sources into its named causes — ad-block loss, consent loss, dedup differences, timing lag — and write it down. You don't have to eliminate the gap; you have to explain it, so "the numbers don't match" stops being a mystery that anyone can exploit and becomes a known, quantified offset.
  2. Designate an authoritative source per decision type. Server-side or warehouse data for revenue and financial reconciliation; client-side for behavioral and UX analysis where relative patterns matter more than absolute counts. The point isn't that one tool is "right" — it's that for any given question, one source is designated, so the choice of number isn't up for negotiation in the meeting.
  3. Validate the instrumentation before trusting any of it. When I take over a measurement setup, the first work I do is the least exciting: confirming the tracking actually fired the way we thought before believing a single downstream number. On one setup, reconciling three disagreeing numbers surfaced that one of them had been instrumented on a subtly different event definition all along — and once that was found and fixed, two of the three numbers snapped much closer together. The discrepancy that looked like a deep measurement mystery was a definitional mismatch nobody had checked. You cannot trust results you haven't validated the instrumentation for; that principle is boring, and it's the whole game.

Do this and the discrepancy stops being a lever anyone can pull. The numbers still differ — they always will — but the difference is documented, the authoritative source is pre-decided, and no one gets to shop for the number that suits them.

FAQ

Which tool is actually correct?

None, in an absolute sense — and that framing is the trap. "Conversions last month" is defined differently by each tool (different dedup, windows, and loss profiles), so each is correct under its own rules and none is correct universally. The useful question isn't "which is right" but "which is authoritative for this decision" — server-side for financial reconciliation, client-side for behavioral patterns — with the discrepancy between them documented and understood.

Why not just switch everything to server-side and be done?

Server-side captures more data by bypassing ad blockers and consent loss, so it's the better choice for revenue reconciliation — but it isn't automatically "the truth" either. It dedupes and attributes on its own rules, and moving to it doesn't eliminate discrepancies with your BI warehouse or client-side behavioral data. It's an upgrade for specific decisions, not a universal reconciliation. You still need to document why sources differ and designate authority per decision.

How do I get anyone to fund the boring reconciliation work?

Frame it as decision risk, not data hygiene. Every budget decision, forecast, and test readout is currently being made on numbers that disagree by double digits, with the choice of number effectively unmanaged. The cost of reconciliation is a few days of analysis; the cost of not reconciling is a standing invitation to set strategy on whichever figure someone shopped for. Put that way, it's cheap insurance against expensive decisions made on selected evidence.

Does this discrepancy affect A/B test results too?

It can, indirectly. If your sources disagree on baseline conversion, they can disagree on whether a test moved it — and a test read on an under-counting or inconsistently-instrumented source can mislead. That's why instrumentation validation belongs upstream of both your reporting and your experimentation: the same definitional mismatch that makes three dashboards disagree can make a test result untrustworthy.

Bottom line

Your analytics tools disagree because they count under different rules — client-side under-counts from ad blockers and consent loss, server-side counts more and differently, the warehouse produces a third number — and none is "the truth." The discrepancy is well understood and almost never reconciled, because reconciliation is expensive, boring, and unowned. In that vacuum it becomes political: the number that gets used is the one that supports the argument or the one the senior person trusts, which makes rhetoric feel like rigor. The fix isn't finding the right tool; it's reconciling the gap once and documenting it, designating an authoritative source per decision, and validating instrumentation before trusting any of it. Do that and nobody gets to shop for the number that suits them.

Getting instrumentation and measurement definitions right — so your numbers are trustworthy before you build on them — is exactly the discipline I put into GrowthLayer. For more on the unglamorous data-quality work that decides whether your decisions are sound, subscribe to Lean Experiments.

Share this article
LinkedIn (opens in new tab) X / Twitter (opens in new tab)
Atticus Li

Experimentation and growth leader. CXL-certified CRO practitioner, Mindworx-certified behavioral economist (1 of ~1,000 worldwide). 200+ A/B tests across energy, SaaS, fintech, e-commerce, and marketplace verticals.