Bootstrap Method
A resampling technique that estimates the sampling distribution of a statistic by repeatedly drawing random samples with replacement from the observed data, requiring no distributional assumptions.
What Is the Bootstrap Method?
Bootstrapping estimates the sampling distribution of any statistic by resampling your data with replacement thousands of times, recalculating the statistic each time. The spread of those recalculated values approximates the true sampling distribution — no distribution assumptions required.
Also Known As
- Data science teams: bootstrap, resampling, nonparametric bootstrap
- Growth teams: empirical confidence intervals
- Marketing teams: "resampling to get a range"
- Engineering teams: BCa bootstrap, percentile bootstrap
How It Works
Imagine an A/B test with 10,000 visitors per variant measuring revenue-per-visitor, where revenue is heavily skewed. You draw 10,000 random values with replacement from each variant (a "bootstrap sample"), compute the difference in means, and store that number. Repeat 10,000 times. You now have 10,000 plausible estimates of the true lift. The 2.5th and 97.5th percentiles give you a 95% confidence interval that respects the actual distribution of your data — no normality assumption, no worry about outliers.
Best Practices
- Do use bootstrap for any statistic whose sampling distribution is unclear (medians, quantiles, ratios).
- Do use at least 10,000 resamples for stable tail estimates.
- Do apply BCa (bias-corrected and accelerated) bootstrap when data is skewed.
- Do not bootstrap when your sample is tiny (n < 30); there is nothing to resample.
- Do not treat bootstrap as a fix for fundamentally biased samples; it only captures sampling uncertainty.
Common Mistakes
- Using too few resamples (< 1,000), producing noisy intervals.
- Bootstrapping dependent data (time series, clustered users) without blocking.
- Reporting bootstrap intervals without mentioning they were produced by resampling.
Industry Context
- SaaS/B2B: Bootstrap handles MRR-per-account metrics where parametric assumptions fail.
- Ecommerce/DTC: Revenue-per-visitor is the canonical bootstrap use case.
- Lead gen/services: Pipeline-weighted metrics often require bootstrap for honest intervals.
The Behavioral Science Connection
Bootstrap is intuitive because it mirrors the way humans think about uncertainty: "what if I ran this again?" Kahneman notes people are better at reasoning with concrete simulation than with abstract probability; bootstrap is concrete simulation turned into a statistical method.
Key Takeaway
Bootstrap is the Swiss Army knife for uncertainty estimation when your data does not fit neat parametric molds.