Standard Deviation
A measure of the average amount of variability in a dataset, representing how spread out values are from the mean, calculated as the square root of the variance.
What Is Standard Deviation?
Standard deviation (SD) measures how spread out a set of values is around their average. A small SD means values cluster tightly; a large SD means they scatter widely. It is the workhorse spread statistic because it shares the same units as the data, making it easy to interpret alongside the mean.
Also Known As
- Data science teams: sigma, sd, standard error (for sampling distributions)
- Growth teams: spread, volatility
- Marketing teams: variability, noise level
- Engineering teams: sigma, jitter, deviation
How It Works
Imagine an A/B test measuring revenue-per-visitor with a mean of $12 and an SD of $40 across 10,000 visitors. The SD tells you a typical visitor deviates from the mean by about $40, which is huge relative to the mean itself. This signals the metric is noisy and you will need a large sample to detect a small lift. If a second metric (time on site) has mean 120 seconds and SD 30 seconds, the coefficient of variation is much smaller and the metric is much easier to test.
Best Practices
- Do report SD alongside the mean so readers can gauge noise.
- Do use the coefficient of variation (SD / mean) to compare variability across metrics on different scales.
- Do winsorize or cap outliers when SD is being dragged by a handful of extreme values.
- Do not confuse SD (describing the data) with standard error (describing the estimate of the mean).
- Do not interpret raw SD without thinking about the unit and the mean.
Common Mistakes
- Reporting SD for heavily skewed revenue data as if it meaningfully describes typical behavior.
- Using SD from the full population when the relevant number is the standard error of the mean.
- Ignoring SD when planning experiments, then being surprised by low power.
Industry Context
- SaaS/B2B: Revenue metrics have extreme SDs because of a few whale accounts; median and trimmed means often tell a clearer story.
- Ecommerce/DTC: Order-value SD is typically 1-2x the mean, which sets your minimum sample requirements.
- Lead gen/services: Lead-quality scoring has high variance; SD shapes the confidence you can have in a lift estimate.
The Behavioral Science Connection
Humans systematically underestimate variability. Tversky and Kahneman's "availability heuristic" leads us to think our recent data is representative, so we underweight the true SD. This produces overconfidence in small samples and explains why practitioners keep getting surprised when "winners" fail to replicate.
Key Takeaway
Standard deviation is the single most important number in experiment design; it determines how much traffic you need.