It is the second Tuesday of the month and the executive team is in the boardroom. The CFO has the floor for ten minutes; then the COO; then customer experience. The CX leader stands up, clicks past the agenda slide, and lands on a single number rendered in 96-point font on a clean white background. NPS: 47. A green up-arrow next to it. Two bullets underneath: “+3 vs. last month, +8 vs. last quarter.” The CEO nods. The CFO nods. Someone says “good direction” and the room moves on to the next agenda item. Roughly two minutes have been spent on the quantitative measurement of the company’s customer relationships, and the proxy used was a single number derived from a single question asked to a fraction of the customer base, anchored to a theoretical framework whose empirical foundations have been formally contested for two decades.
That single number is the Net Promoter Score, and the question is the one Frederick Reichheld popularized in his December 2003 Harvard Business Review article titled — with refreshing directness — “The One Number You Need to Grow.” Reichheld, then a Bain & Company fellow, proposed that one survey question delivered to customers would predict a company’s growth rate better than any other customer-measurement instrument: “How likely is it that you would recommend [our company / product / service] to a friend or colleague?” Responses on an 11-point 0-to-10 scale get bucketed: 9–10 are “promoters,” 7–8 are “passives,” 0–6 are “detractors.” Subtract the percentage of detractors from the percentage of promoters and you get a single score, ranging from -100 to +100. That score, Reichheld argued, was the one number an executive team needed to track to know whether they were building or destroying customer-driven growth.
The article went viral by the standards of HBR. The Ultimate Question (2006) and The Ultimate Question 2.0 (2011), Reichheld’s book-length expansions co-authored with Bain colleague Rob Markey, sold well into the practitioner-business audience. Bain & Company built a substantial NPS consulting practice. Satmetrix and Medallia and Qualtrics and a generation of CX measurement platforms productized the framework. By the mid-2010s, NPS was the dominant customer-measurement metric in U.S. corporate practice — Reichheld and Markey reported in the 2011 book that NPS was used by “two-thirds of the Fortune 1000,” and independent industry surveys consistently put the figure for Fortune 1000 adoption at roughly 80%. It is the rare academic-adjacent management concept that became operationalized in shareholder-letter language: when a Fortune 100 CEO writes “we are committed to driving NPS into the high 60s,” analysts know what he means.
The empirical reality is much messier than the boardroom slide suggests. Within four years of the original Reichheld article, Timothy Keiningham, Bruce Cooil, Tor Wallin Andreassen, and Lerzan Aksoy published a longitudinal test of Reichheld’s central predictive-validity claim in the Journal of Marketing (2007) — covering 21 industries and roughly 15,500 customers — and concluded that NPS did not outperform other commonly used customer metrics in predicting firm revenue growth. Neil Morgan and Lopo Rego, in Marketing Science (2006), had tested a similar question and reached a similar conclusion: customer satisfaction measures predicted firm performance at least as well as recommendation intention measures, and often better. Jenny van Doorn, Peter Leeflang, and Marleen Tijs (2013), in the International Journal of Research in Marketing, conducted a direct replication of the Keiningham et al. analysis on a new dataset and confirmed the original critique. Kai Kristensen and Jacob Eskildsen (2014), in The TQM Journal, tested whether NPS was a trustworthy performance measure and reached the negative answer the title implies. The independent peer-reviewed literature on NPS’s central predictive claim — is NPS the single best customer metric for predicting growth — has consistently failed to confirm what Reichheld asserted.
This article walks through that gap. Like the CliftonStrengths case, NPS is not a fraud, is not a folk theory, and is not a replication-failed psychology finding. It is something more particular and, for an operator deciding whether to wire NPS into the company’s executive scorecard, more confusing: a polished, commercially extraordinarily successful framework whose central empirical claim was contested by independent academic work within four years of publication and has remained contested ever since, while corporate adoption continued to scale because the framework’s commercial design was excellent regardless of what the academic evidence said. The right operator answer is not “stop using NPS.” It is “stop letting one number do work it cannot do.”
What Reichheld 2003 Actually Claimed
The December 2003 Harvard Business Review article — Reichheld, F. F. (2003), “The one number you need to grow,” Harvard Business Review, 81(12), 46–54 — is short, well-written, and unambiguous about its central claim. Reichheld and his Bain colleagues, working with the customer-survey data platform Satmetrix, tested a battery of survey questions against company-level performance outcomes across approximately 14 industries and 4,000 customers. The candidate questions covered the standard satisfaction-and-loyalty repertoire: overall satisfaction, repurchase intention, willingness to recommend, perceived value, and so forth. For each industry, Reichheld reports correlating customer responses to each candidate question with the company’s revenue growth rate over the following years.
The headline finding: “willingness to recommend” produced the highest correlation with revenue growth in eleven of fourteen industries. From this Reichheld drew the strong inference that “would recommend” was the single best survey question to ask customers, and the simplest possible measurement instrument — promoters minus detractors — was sufficient to capture the growth-relevant signal. The “one number” framing was not metaphorical. Reichheld was making the strong empirical claim that an executive team monitoring NPS was monitoring the customer-side variable most predictive of growth, and adding more questions or more sophisticated metrics would not improve predictive validity proportional to the additional measurement burden.
The HBR article is admirably clear about its methodology. It is also admirably clear about its commercial implication: the right thing for an executive team to do is operationalize NPS, instrument every customer touchpoint to capture it, drive accountability for the score down through the organization, and treat improvements in NPS as the primary leading indicator of growth. Bain’s NPS consulting practice was built around exactly that operational architecture, and tens of thousands of corporate NPS programs were built on the back of it.
There are two things worth flagging about the original article from a methodological standpoint before we look at the independent replications. First, the data were proprietary to Satmetrix and Bain — the underlying datasets used to support the “eleven of fourteen industries” claim were not archived in a publicly accessible repository, and the specific industry-level correlations were not all reported. Second, the article was published in Harvard Business Review — a practitioner magazine, not a peer-reviewed academic journal — and so the analysis did not go through the kind of methodological review that a marketing-science article in Journal of Marketing or Marketing Science would have received. None of this makes the analysis wrong. It does mean the strong-form generalization that NPS is empirically the best customer metric was published with less methodological scrutiny than the conclusion’s strength would warrant.
Keiningham, Cooil, Andreassen & Aksoy (2007) — The First Direct Test
The most important independent test of Reichheld’s central claim is Keiningham, T. L., Cooil, B., Andreassen, T. W., & Aksoy, L. (2007). “A longitudinal examination of net promoter and firm revenue growth.” Journal of Marketing, 71(3), 39–51. DOI: 10.1509/jmkg.71.3.039. The paper was published in a top-tier peer-reviewed marketing journal, written by academic researchers, and used a dataset substantially larger than the original Reichheld analysis: 21 industries and approximately 15,500 customers, drawn from the Norwegian and U.S. customer satisfaction barometers — large, longitudinally tracked, publicly-described industry-level customer survey datasets.
The Keiningham et al. design did something the original Reichheld analysis had not done: it tested the comparative predictive validity of NPS against alternative customer metrics — overall customer satisfaction (the American Customer Satisfaction Index–style measure) and a multi-item recommendation intention index — across a larger and more methodologically transparent dataset. They asked the right comparative question. Does NPS predict firm revenue growth better than the alternatives Reichheld claimed it surpassed?
The answer: no. Across the 21 industries, NPS did not show a systematic predictive advantage over the American Customer Satisfaction Index or over multi-item recommendation indices. In some industries the customer satisfaction measure outperformed NPS as a predictor of subsequent revenue growth; in others NPS was roughly equivalent to alternative measures; in no consistent pattern did NPS dominate. The authors concluded that the empirical evidence did not support the strong “one number” claim — NPS was one customer metric among several reasonable customer metrics, with no demonstrable predictive superiority that would justify treating it as uniquely diagnostic.
The Keiningham et al. paper landed in the marketing-science community as a serious challenge to the NPS framework. It is the most-cited critical analysis of NPS in the academic literature, has been replicated and extended in subsequent papers, and has, in the academic discipline of marketing, substantially settled the methodological question: the strong-form “one number you need” claim is not supported by independent peer-reviewed evidence. In the corporate practitioner community, the Keiningham et al. finding has been less impactful — Bain & Company has continued to promote NPS as foundational, and corporate adoption has continued to scale through the 2010s and into the 2020s. This is the gap between what marketing science has concluded and what corporate practice has institutionalized.
Morgan & Rego (2006) — The Parallel Evidence
A closely related independent study published a year before Keiningham et al. is Morgan, N. A., & Rego, L. L. (2006). “The value of different customer satisfaction and loyalty metrics in predicting business performance.” Marketing Science, 25(5), 426–439. DOI: 10.1287/mksc.1050.0180. Morgan and Rego were not testing NPS specifically — their paper predates Keiningham et al. by a year and was conceived before NPS had become a fully institutionalized practitioner framework — but they tested the broader question of which customer-measurement metrics most strongly predict subsequent firm performance, using the American Customer Satisfaction Index database covering 200 firms across multiple industries.
Their findings cut directly against the “one number” framing in two ways. First, customer satisfaction metrics (in the ACSI family) showed meaningful predictive relationships with subsequent firm performance metrics including cash-flow growth, stock-price returns, and market-share growth — these effects were not dramatic in magnitude but they were statistically reliable and operationally meaningful. Second, the predictive validity differed for different performance outcomes — satisfaction predicted some outcomes better than recommendation intention, and recommendation intention predicted other outcomes better, with no single measure dominating across all outcomes. The pattern Morgan and Rego documented was inconsistent with the strong claim that one customer metric is uniformly superior; it was consistent with the more modest claim that several customer metrics carry signal and that the right composite depends on what business outcome the executive team is trying to predict.
Morgan and Rego is technical and quietly devastating to the “one number” framing. The paper does not name NPS — Reichheld’s article had only been out for three years and NPS had not yet become the dominant practitioner framework — but the empirical logic applies directly. If multiple customer metrics carry independent signal, and the right composite varies by outcome, then collapsing customer measurement to a single number on the executive scorecard is throwing away predictive information, not concentrating it.
van Doorn, Leeflang & Tijs (2013) — The Replication
A particularly important paper for the empirical-evaluation argument is van Doorn, J., Leeflang, P. S. H., & Tijs, M. (2013). “Satisfaction as a predictor of future performance: A replication.” International Journal of Research in Marketing, 30(3), 314–318. DOI: 10.1016/j.ijresmar.2013.04.002. The paper is short, conceived explicitly as a replication, and tests whether the earlier findings on customer satisfaction’s predictive validity replicate in a new dataset.
The replication design is a direct retest of the predictive-validity question on independent data. Van Doorn and colleagues used Dutch customer survey data to test whether the patterns Morgan & Rego had documented — and the patterns Keiningham et al. had shown to undercut the “one number” claim — held up. The headline finding: yes, customer satisfaction predicts subsequent firm performance, and no, NPS does not show a systematic predictive advantage over satisfaction-based measures. The replication confirmed the broader pattern that customer satisfaction is a reasonable predictive metric for firm performance and that the specific NPS framework does not enjoy the unique predictive status Reichheld claimed.
What makes van Doorn et al. (2013) particularly important is that it is a deliberate replication. In a field whose replication crisis hub catalogs dozens of high-profile original findings that did not survive independent retesting, NPS sits in a particular category — the original claim has been tested by multiple independent research groups on multiple independent datasets, and the strong-form claim has consistently failed to replicate. This is not “we have not yet tested it.” This is “we have tested it, repeatedly, and the original claim does not hold.”
Kristensen & Eskildsen (2014) — The Synthesizing Critique
The synthesizing piece in the academic critique of NPS is Kristensen, K., & Eskildsen, J. (2014). “Is the Net Promoter Score a trustworthy performance measure?” The TQM Journal, 26(2), 202–214. Kristensen and Eskildsen, from Aarhus University, conducted a comprehensive review of the NPS validation literature and added their own analyses using European Customer Satisfaction Index data. Their conclusion, signaled in the article’s question-form title, is that NPS does not meet the criteria a serious performance-measurement instrument should meet: the predictive validity is no better than alternative customer metrics, the measurement properties of the 0-to-10 scale with the 9-10/7-8/0-6 categorization are statistically inefficient, and the “one number” framing is operationally misleading in ways that obscure rather than illuminate the underlying customer-experience dynamics.
The Kristensen and Eskildsen paper is worth reading in full for an executive considering an NPS program. It is the most direct articulation of the academic-marketing-community’s settled view on NPS as a performance measure: it is a reasonable customer-experience metric among several reasonable alternatives; it is not uniquely diagnostic; and the corporate institutionalization of NPS as the measure has outrun the empirical case for it.
Why The “One Number” Framing Specifically Misleads
The Keiningham et al. and Morgan & Rego and van Doorn et al. and Kristensen & Eskildsen results are all in the same direction: NPS is fine, NPS is not specifically bad, and NPS is not specifically better than the alternatives it claimed to surpass. The deeper methodological problem with the framework, however, is not the predictive-validity question. It is the “one number” framing itself.
A single composite metric — promoters minus detractors — collapses three distinguishable phenomena into one number. A score change from 40 to 50 might mean any of:
- More 9-10 responses (more promoters generated).
- Fewer 0-6 responses (fewer detractors created).
- Shift in the 7-8 passive segment (more passives moved up to promoters or fewer moved down to detractors).
- Some combination of the above with offsetting flows.
The four scenarios have very different operational implications. Scenario 1 might reflect a successful brand-building or delight-creation initiative; scenario 2 might reflect a successful customer-service or service-recovery initiative; scenario 3 might reflect a subtle positioning improvement. An executive team that watches only the headline score is structurally unable to distinguish among these, and the single-metric tunnel vision often leads operational owners to optimize for the score in ways that do not correspond to the underlying customer-experience dynamics the score was supposed to summarize.
The structural problem is that the 0-10 scale is a richer information instrument than its 11-category compressed summary, and the 9-10/7-8/0-6 bucketing throws away substantial information from the underlying response distribution. A customer satisfaction system that retained mean scores, distribution shapes, segment-level patterns, and longitudinal trajectories of the full response distribution would carry more information than the headline NPS, even when constructed from the same survey question. The “one number” framing was a deliberate product design choice for executive accessibility — and it is real value to executive accessibility — but the cost is information loss the academic critiques have repeatedly documented.
Why NPS Persists Despite The Independent Critiques
If the academic-marketing literature has been clear for nearly two decades that NPS does not have the unique predictive validity Reichheld claimed, why is NPS still on the executive scorecard at roughly 80% of Fortune 1000 companies? The answer is a mix of factors that, much like in the CliftonStrengths case, have very little to do with psychometrics.
Executive simplicity. “One number you need to grow” is the strongest possible product positioning for an executive-attention scarce environment. A CEO who has ten minutes for customer-experience review wants one number, not a dashboard. The NPS framework is unbeatable as a vehicle for compressing customer experience into a board-meeting-friendly scalar.
Compensation alignment. Once NPS is wired into executive compensation — and at many large companies it is — the framework is institutionalized in a way that has nothing to do with whether the academic literature supports it. Changing the metric requires changing the compensation plan, which requires committee approvals and board engagement. The switching costs are real.
Bain & Company’s institutional commitment. Bain has built a substantial NPS consulting practice over two decades, and the firm has continued to publish supportive material and to deploy NPS architectures at client engagements. Reichheld and Markey’s The Ultimate Question 2.0 (2011) and Reichheld, Markey, and Darnell’s Winning on Purpose (2021) extended and defended the framework. The consulting-firm institutional commitment carries forward the framework even as the academic critiques accumulate.
Vendor ecosystem. Medallia, Qualtrics, SurveyMonkey, Satmetrix, and dozens of other CX-measurement platforms productized NPS in the late 2000s and 2010s. The vendor infrastructure for collecting, reporting, and benchmarking NPS is mature and well-tooled. Migrating an enterprise CX program off NPS to an alternative architecture is a substantial undertaking with switching costs that often exceed the predictive-validity gains.
Benchmarking value. Once an industry adopts a common metric, the metric becomes more valuable for benchmarking purposes regardless of its underlying validity. “Our NPS is 47, the industry median is 32” is information that has executive utility even if the absolute number is a noisy measure of underlying customer experience.
Reichheld’s reasonable defenses. Reichheld himself has not ignored the academic critiques. In The Ultimate Question 2.0 (2011) and subsequent writing, he has acknowledged measurement limitations, distinguished between the score and the operational system around it (“NPS the system” vs. “NPS the score”), emphasized the importance of root-cause follow-up on detractor segments, and clarified that NPS is most useful when embedded in a broader closed-loop feedback architecture rather than treated as a standalone metric. These are reasonable concessions and they substantially soften the strong-form 2003 claim. The institutionalized boardroom usage of NPS, however, has not absorbed those concessions — most NPS dashboards present a single number, not the operational system Reichheld now emphasizes.
Habit and inertia. Customer experience programs are years-long investments. The cost of switching frameworks is large, the marginal predictive-validity gain from any single alternative is modest, and the path of least resistance for a CX leader is to continue the existing program. The question for a CMO or CXO is rarely “do we adopt NPS for the first time” — it is “do we change the metric we have been using for ten years” — and the answer to that question is almost always no, regardless of what the academic literature says.
None of these factors are reasons NPS is the best customer-experience metric. They are reasons NPS is the institutionally dominant customer-experience metric. The two are different questions.
Alternatives And Better Architectures
If NPS does not have the predictive superiority Reichheld claimed, what should a CMO or CXO actually use? The honest answer is that customer experience is multi-dimensional, no single metric captures it well, and the right answer is an instrumented system of multiple complementary metrics that is anchored in the specific business outcomes the executive team is trying to predict. A few of the better-validated and operationally useful options:
Customer Satisfaction (CSAT) — the broader family of satisfaction measures, including the American Customer Satisfaction Index (ACSI) methodology, has stronger and longer-standing predictive-validity evidence than NPS. Morgan & Rego (2006) and van Doorn et al. (2013) both confirm that satisfaction measures predict firm performance at least as well as recommendation intention. CSAT measured at the transaction level (post-purchase, post-interaction) is also more diagnostic of specific service failures than the company-level “would you recommend” question.
Customer Effort Score (CES) — Matthew Dixon, Karen Freeman, and Nicholas Toman’s 2010 Harvard Business Review article “Stop Trying to Delight Your Customers” proposed CES as an alternative measure of customer experience, asking “how much effort did you have to put forth to handle your request?” CES is particularly useful for service-oriented experiences and has emerging peer-reviewed validation in service-quality literature.
Multi-item satisfaction indices. The ACSI methodology, the SERVQUAL framework (Parasuraman, Zeithaml, Berry 1988), and other multi-item indices retain more measurement signal than single-question scores. They are operationally heavier and harder to communicate to an executive audience, but they are more diagnostic and have stronger predictive-validity records.
Behavioral metrics. For most companies, the most predictive customer metrics are not survey-based at all — they are behavioral: repurchase rates, churn rates, net revenue retention, share of wallet, referral rates measured behaviorally rather than via stated intention. Behavioral data is more reliable than self-report data because actions reveal preferences in ways that survey responses do not. A CX program anchored on behavioral metrics with survey instruments as supplementary diagnostic information is generally a better-designed program than one anchored on NPS as the headline measure.
Composite scorecards. The architecture most CX-research-literature-informed CXOs end up at is a small composite scorecard — three to five metrics that include at least one satisfaction-style metric, at least one behavioral metric, and segment-level breakdowns rather than aggregate-only scores. This is operationally heavier than the one-number NPS framing but produces more actionable information.
The Strategist Takeaway
For a CMO, CXO, or CEO evaluating whether to continue an NPS program, the calibrated answer is:
NPS is fine as one signal among several. The independent academic critique is not that NPS is invalid or harmful — it is that the strong-form “one number you need” claim is not supported. NPS is a reasonable customer-experience metric. It is not the reasonable customer-experience metric, and it does not deserve the unique status the boardroom scorecard typically gives it.
The “one number” framing is what is empirically unsupported. If you are using NPS as part of a broader CX measurement architecture — with closed-loop follow-up on detractors, segment-level diagnostics, behavioral metrics, and complementary satisfaction-style measures — you are using NPS responsibly and the academic critique does not undercut what you are doing. If you are using NPS as the headline metric for executive accountability without that surrounding architecture, you are exposed to exactly the criticism Keiningham et al. and Morgan & Rego and van Doorn et al. and Kristensen & Eskildsen have documented.
Do not over-invest in NPS-specific tactical optimization. Programs that drive operational decisions specifically to move the NPS number — service-recovery routines triggered by detractor scores, compensation tied directly to NPS deltas, marketing campaigns designed to elicit promoter behavior — are optimizing a proxy that may or may not correspond to the underlying business outcome (revenue growth, retention, lifetime value). The proxy-optimization trap is the same one documented in the A/B testing surrogate-metric trap article: you can hit your metric and miss your goal.
Pair NPS with behavioral and longitudinal data. Survey-based customer-experience measures should be triangulated with behavioral data (churn, retention, repurchase, referral). Customers’ stated intentions and actual behavior frequently diverge, and the integrated picture is substantially more diagnostic than either signal alone.
If you are building a CX measurement program from scratch, the right design is probably not “deploy NPS first.” It is “instrument behavioral retention and revenue metrics first, then layer in survey-based satisfaction measures at the transaction and relationship level, then choose a single executive-scorecard composite based on which signal is most predictive of your specific growth model.” NPS may end up in that composite. It may not. That choice should be driven by what predicts growth in your specific business, not by what the management-fashion default is.
If you are deep in an existing NPS program, the right move is probably not to rip and replace mid-flight. It is to be internally honest about what NPS is and is not — a useful but not uniquely predictive customer signal, embedded in an operational system that does most of the actual customer-experience work — and to resist over-claiming what the headline number means. The risk of over-claiming is that the boardroom scorecard becomes a coordination object that drives suboptimal operational decisions and obscures rather than illuminates what is actually happening with your customers.
The pattern this article is documenting is the same pattern documented for CliftonStrengths, MBTI, and several other commercially-successful-but-empirically-thin frameworks in this hub: a polished consultancy product that solved a real executive-communication problem, was institutionalized at scale before independent academic validation caught up, and persists in the corporate ecosystem long after the independent academic literature has substantially contested the strong-form claims. The corporate operator who reads only the practitioner literature on NPS will conclude it is a settled best practice. The corporate operator who reads the marketing-science literature on NPS will conclude it is one reasonable signal among several with no special status. The gap between the two is large, persistent, and economically consequential.
Sources
Primary source (Reichheld and Bain):
- Reichheld, F. F. (2003). The one number you need to grow. Harvard Business Review, 81(12), 46–54.
- Reichheld, F. F. (2006). The Ultimate Question: Driving Good Profits and True Growth. Boston: Harvard Business School Press.
- Reichheld, F. F., & Markey, R. (2011). The Ultimate Question 2.0: How Net Promoter Companies Thrive in a Customer-Driven World. Boston: Harvard Business Review Press.
- Reichheld, F., Markey, R., & Darnell, D. (2021). Winning on Purpose: The Unbeatable Strategy of Loving Customers. Boston: Harvard Business Review Press.
Independent academic critiques:
- Keiningham, T. L., Cooil, B., Andreassen, T. W., & Aksoy, L. (2007). A longitudinal examination of net promoter and firm revenue growth. Journal of Marketing, 71(3), 39–51. DOI: 10.1509/jmkg.71.3.039
- Morgan, N. A., & Rego, L. L. (2006). The value of different customer satisfaction and loyalty metrics in predicting business performance. Marketing Science, 25(5), 426–439. DOI: 10.1287/mksc.1050.0180
- van Doorn, J., Leeflang, P. S. H., & Tijs, M. (2013). Satisfaction as a predictor of future performance: A replication. International Journal of Research in Marketing, 30(3), 314–318. DOI: 10.1016/j.ijresmar.2013.04.002
- Kristensen, K., & Eskildsen, J. (2014). Is the Net Promoter Score a trustworthy performance measure? The TQM Journal, 26(2), 202–214. DOI: 10.1108/TQM-03-2011-0021
- Keiningham, T. L., Aksoy, L., Cooil, B., & Andreassen, T. W. (2008). Linking customer loyalty to growth. MIT Sloan Management Review, 49(4), 51–57.
Alternative customer-measurement frameworks:
- Dixon, M., Freeman, K., & Toman, N. (2010). Stop trying to delight your customers. Harvard Business Review, 88(7/8), 116–122. (Origin of the Customer Effort Score.)
- Parasuraman, A., Zeithaml, V. A., & Berry, L. L. (1988). SERVQUAL: A multiple-item scale for measuring consumer perceptions of service quality. Journal of Retailing, 64(1), 12–40.
- Fornell, C., Johnson, M. D., Anderson, E. W., Cha, J., & Bryant, B. E. (1996). The American Customer Satisfaction Index: Nature, purpose, and findings. Journal of Marketing, 60(4), 7–18. DOI: 10.2307/1251898
Methodological context:
- de Haan, E., Verhoef, P. C., & Wiesel, T. (2015). The predictive ability of different customer feedback metrics for retention. International Journal of Research in Marketing, 32(2), 195–206. DOI: 10.1016/j.ijresmar.2015.02.004 (Extends the Keiningham et al. comparison to retention prediction specifically.)
- Pingitore, G., Morgan, N. A., Rego, L. L., Gigliotti, A., & Meyers, J. (2007). The single-question trap. Marketing Research, 19(2), 9–13. (Practitioner-facing summary of the empirical critique of single-question customer metrics.)
Related
- /replication-crisis/ — Replication Crisis Hub home, covering 80+ canonical findings that did not survive independent scrutiny.
- /replication-crisis/cliftonstrengths-strengthsfinder/ — The other polished consultancy assessment product whose independent academic validation is thin relative to its commercial scale.
- /replication-crisis/ab-testing-surrogate-metric-trap/ — The single-number-on-the-scorecard problem applies to experimentation metrics too. NPS optimization is a surrogate-metric trap by another name.
- /replication-crisis/mehrabian-7-38-55/ — Another foundational practitioner claim — that communication is 7% words, 38% tone, 55% body language — that survived in corporate training decks long after the original research had been clarified.
- /replication-crisis/goleman-emotional-intelligence/ — The “emotional intelligence predicts performance better than IQ” claim is a parallel case: a polished consulting framework whose strong-form empirical claim has not survived independent peer-reviewed scrutiny.
- /replication-crisis/myers-briggs-mbti/ — The original corporate-assessment-product-as-management-fashion case. MBTI has weaker psychometric foundations than NPS but the institutional-adoption-pattern is the same.
FAQ
Q: Should our company stop tracking NPS?
A: Probably not — the cost of switching frameworks is high and the marginal predictive-validity gain from any single alternative is modest. The right move is to stop letting NPS be the headline number on the executive scorecard and to embed it in a broader customer-experience measurement architecture that includes behavioral retention, satisfaction-style measures, and segment-level diagnostics. Continue measuring NPS, stop over-claiming what it tells you.
Q: Is NPS as bad as MBTI?
A: No. NPS is empirically meaningfully stronger than MBTI. NPS carries real customer-experience signal and correlates with firm performance metrics; the academic critique is that it does not uniquely outperform other reasonable customer metrics. MBTI, by contrast, has poor test-retest reliability and weak construct validity. The two are not in the same category. NPS is “useful but not uniquely useful.” MBTI is “not psychometrically defensible at all.”
Q: What about Reichheld’s argument that critics misunderstand NPS — that it is a “system” not just a score?
A: This is the most reasonable Bain response to the academic critique and it has some merit. NPS-the-system — closed-loop follow-up on detractor segments, root-cause analysis of why customers gave low scores, operational accountability for service-recovery responses — is a defensible customer-experience management architecture. The empirical critique applies specifically to NPS-the-score and to the “one number” headline framing. If your NPS program is rigorously implementing the closed-loop system Reichheld describes in The Ultimate Question 2.0, you are doing something defensible. If your NPS program is a quarterly survey, a dashboard, and an executive bullet point, you are doing the thing the academic critique targets. The distinction matters operationally.
Q: Our compensation plan is tied to NPS. Are you saying that is wrong?
A: Tying compensation to a single composite metric whose predictive validity for the actual business outcomes you care about is contested is operationally risky. The risk is that you incentivize operational decisions that move the metric without moving the underlying outcome — service-recovery campaigns that improve scores by pacifying detractors rather than fixing the underlying issues, marketing campaigns that improve scores by adjusting who responds to the survey rather than improving the actual experience, sample-frame manipulation that improves scores by selectively surveying happier customer segments. If you are going to tie compensation to a metric, the metric should be either a direct business outcome (revenue retention, churn rate, lifetime value) or a composite scorecard that triangulates multiple signals. Single-metric compensation tied to NPS has known failure modes documented in operational research literature.
Q: Is the Keiningham 2007 paper really that decisive?
A: It is the single most-cited critical analysis of NPS in the academic marketing literature, was published in Journal of Marketing (the discipline’s flagship journal), and used a substantially larger and more methodologically transparent dataset than the original Reichheld analysis. It has been replicated and extended by van Doorn et al. (2013) and de Haan et al. (2015) and others. By academic-marketing-discipline standards, the strong-form Reichheld claim has been substantively contested by multiple independent research groups over nearly two decades, and the contesting evidence has not been overturned. It is fair to call the academic question settled — NPS does not have the unique predictive validity originally claimed.
Q: What is the single best alternative to NPS?
A: There is no single best alternative — the answer is a small composite scorecard, with the specific composition depending on your business model. For a SaaS company, anchor on net revenue retention and segment-level churn, supplemented by transaction-level CSAT and a CES-style measure on service interactions. For a retailer, anchor on repurchase rate, share of wallet, and basket-level satisfaction. For a B2B services firm, anchor on contract renewal rate, account-level satisfaction, and qualitative voice-of-customer feedback. The point is that the right answer is industry- and business-model-specific, not a universal “use this instead of NPS” prescription.
Q: Why has corporate adoption continued to scale even though the academic critique has been around for nearly twenty years?
A: The same reason that other commercially successful but empirically contested frameworks persist — executive simplicity, vendor ecosystem lock-in, compensation alignment, benchmarking utility, switching costs, and the gap between the practitioner and academic literatures. The corporate procurement and management-fashion dynamics that drove NPS adoption are largely orthogonal to the empirical validation question. This is exactly the pattern the CliftonStrengths article documents in a different domain. The right operator-level response is not to be surprised that corporate practice has not absorbed the academic critique; it is to read the academic critique anyway and calibrate your own program accordingly.
Q: Is Bain & Company defending NPS in bad faith?
A: No — Reichheld and Markey have engaged with the academic critiques and have refined the framework’s operational claims over time. The Ultimate Question 2.0 (2011) is a more measured book than the 2003 HBR article, and Winning on Purpose (2021) emphasizes the system over the score. Bain has a real intellectual and commercial commitment to NPS, and the firm has substantial accumulated expertise in deploying NPS-based CX architectures. The critique is not that Bain is being dishonest; it is that the strong-form 2003 claim that one number is sufficient for executive customer-measurement is empirically unsupported, and the operational dynamics of NPS adoption in client organizations have outrun the more measured claims Reichheld now makes.
Q: What is the single best thing to read on this?
A: Read the Keiningham et al. (2007) Journal of Marketing paper and the Reichheld (2003) HBR article side by side. The two together are a clean illustration of how a practitioner-magazine claim with substantial commercial implication and a peer-reviewed empirical retest can produce diverging conclusions about the same underlying question. The Kristensen and Eskildsen (2014) paper is the best synthesizing critique. The Ultimate Question 2.0 (2011) by Reichheld and Markey is the most thorough defense of the operational framework from the Bain side. A reader who works through all four of these will have a substantially more calibrated view of NPS than 90% of CX practitioners.