Standard Deviation

Numbers (one per line, comma or space separated)

Quick examples:

What Is Standard Deviation?

Standard deviation is a statistical measure of the dispersion of data points relative to their mean. It gives the "average distance" of each data point from the mean, expressed in the same units as the original data.

Why standard deviation instead of variance? Variance is the average of the squared deviations, which means its units are the square of the original data units (e.g., if data is in meters, variance is in meters²). Standard deviation, by taking the square root of variance, returns to the original units, making it directly comparable to and interpretable alongside the data. For instance, if a set of height measurements has a standard deviation of 5 cm, we intuitively understand that most heights are within about 5 cm of the average.

The concept traces back to Carl Friedrich Gauss (1777–1855) and his work on error theory, but the term "standard deviation" was formally introduced by English statistician Karl Pearson in 1894.

Intuition

Imagine shooting at a target: the mean is the bullseye, and the standard deviation measures how "spread out" your shots are. A small standard deviation means tight clustering (high precision); a large one means wide scattering.

Population vs Sample Standard Deviation

This is one of the most commonly confused distinctions in statistics. The two calculations are nearly identical, differing only in the denominator:

	Population Std Dev (σ)	Sample Std Dev (s)
When to use	You have all the data	You have a subset of the population
Divide by	N (total count)	n - 1 (sample size minus 1)
Symbol	σ (sigma)	s
Example	Exam scores of all 30 students	Income of 200 randomly sampled residents

Why does sample standard deviation divide by n-1 instead of n? This is Bessel's correction, named after the German astronomer Friedrich Bessel (1784–1846), who discovered its necessity in the 1820s while studying errors in astronomical observations.

The core reason: when we use the sample mean x̄ instead of the unknown population mean μ to compute deviations, we systematically underestimate the true variance. This happens because x̄ is the value that minimizes Σ(x_i-x̄)²—it is "pulled toward" the data—so the computed sum of squared deviations is always too small. Dividing by n-1 instead of n exactly corrects this bias, making the sample variance an unbiased estimator of the population variance.

Formulas in Detail

Population standard deviation:

σ = √1N Σ_i=1^N (x_i - μ)²

Sample standard deviation:

s = √1n-1 Σ_i=1ⁿ (x_i - x̄)²

Where:

x_i — the i-th data point
μ — population mean (average of all data)
x̄ — sample mean
N — population size (total count)
n — sample size
Σ — summation symbol (sum over all data points)

Manual calculation steps:

Compute the mean: x̄ = Σx_i / n
Compute each deviation from the mean: x_i - x̄
Square each deviation: (x_i - x̄)²
Sum all squared deviations: SS = Σ(x_i - x̄)²
Divide by N (population) or n-1 (sample) to get the variance
Take the square root of the variance to get the standard deviation

Bessel's Correction: Why n-1?

Bessel's correction is one of the most elegant results in probability theory. Here is a simplified version of the mathematical proof.

Core theorem: If x₁, x₂, ..., x_n are independent random samples from a population with mean μ and variance σ², then:

Expected Value

E[Σ(x_i - x̄)²] = (n - 1)σ², not nσ²

Proof sketch:

Step 1 — Identity

Σ(x_i - x̄)² = Σ(x_i - μ)² - n(x̄ - μ)²
Write x_i - x̄ as (x_i - μ) - (x̄ - μ), expand the square, and sum.

Step 2 — Take expectations

E[Σ(x_i - μ)²] = nσ² (each term has expected value σ²)
E[n(x̄ - μ)²] = n · Var(x̄) = n · σ²/n = σ²

Step 3 — Combine

E[Σ(x_i - x̄)²] = nσ² - σ² = (n - 1)σ²

Therefore, to make the estimator's expected value equal to the true variance σ² (i.e., unbiased), we must divide by (n-1), not n:

E[Σ(x_i - x̄)²n - 1] = σ² ✓

Degrees of freedom intuition: After computing the mean x̄ from n data points, only n-1 values are free to vary (the last one is determined by the constraint that they must sum to n·x̄). This is the concept of "degrees of freedom"—n observations minus 1 estimated parameter.

The 68-95-99.7 Rule (Empirical Rule)

For data that follows a normal distribution (also called the bell curve or Gaussian distribution), the proportion of data within standard deviation ranges follows a precise pattern:

0.15%

13.6%

68.2% — within μ ± 1σ

13.6%

0.15%

68.27% of data falls within μ ± 1σ
95.45% of data falls within μ ± 2σ
99.73% of data falls within μ ± 3σ

Historical note: The normal curve was first described by the French mathematician Abraham de Moivre in 1733, as an approximation to the binomial distribution for large samples. Gauss later applied it to the analysis of astronomical observation errors, which is why the normal distribution is also called the "Gaussian distribution."

Practical example

If a set of exam scores has a mean of 75 and a standard deviation of 10, and follows an approximately normal distribution, then about 68% of students scored between 65–85, about 95% scored between 55–95, and scores below 45 or above 105 are extremely rare (less than 0.3%).

Caveat: The empirical rule only applies to normal or approximately normal data. For skewed or heavy-tailed distributions, use Chebyshev's inequality: regardless of distribution shape, at least 1 - 1/k² of data falls within μ ± kσ. For example, k=2 gives at least 75%, k=3 gives at least 88.9%.

Coefficient of Variation (CV)

CV = sx̄ × 100%

The coefficient of variation was introduced by the English statistician Karl Pearson in 1896. It is the ratio of the standard deviation to the mean, typically expressed as a percentage.

Why do we need CV? Standard deviation is an absolute measure, affected by the scale of the data. If one dataset has a mean of 1000 and standard deviation of 50, and another has a mean of 10 and standard deviation of 5, which is "more dispersed"? Looking at standard deviation alone (50 vs 5), the first seems more variable. But computing CV (5% vs 50%), the second has a relative dispersion 10 times higher!

Typical applications of CV include:

Cross-domain comparison: comparing variability of data with different units (e.g., stock prices vs temperatures)
Laboratory quality control: a CV < 5% is generally considered good repeatability
Financial analysis: used to compare risk-adjusted returns across assets (the Sharpe ratio is closely related to CV)

Limitations

CV is only reliable when the mean is meaningful and not close to zero. For data that can be negative or near zero, such as temperature in Celsius, CV can be misleading or meaningless. In such cases, use standard deviation itself for comparisons.

Applications

Quality Control — Six Sigma

Six Sigma methodology was developed by Motorola engineer Bill Smith in 1986. Its core idea is to keep process variation within μ ± 6σ, corresponding to 3.4 defects per million (99.99966% yield). Standard deviation is the central metric for measuring process capability.

Finance — Volatility

In finance, the standard deviation of asset returns is called "volatility," the core measure of risk. Annualized volatility = daily return std dev × √252 (252 trading days per year). The σ in the Black-Scholes option pricing model is precisely this standard deviation.

Science — Measurement Error

In experimental science, the standard deviation of repeated measurements quantifies measurement precision. Results are typically reported as "mean ± std dev" (e.g., 9.81 ± 0.02 m/s²). Standard Error of the Mean (SEM = s/√n) measures uncertainty in the mean estimate.

Education — Z-scores

Z-score = (x - x̄) / s, converting raw scores to "how many standard deviations from the mean." Standardized tests like SAT and GRE use Z-scores to make scores comparable across years. A Z of 1.0 means one standard deviation above the mean, roughly the 84th percentile.

Meteorology — Anomaly Detection

Meteorologists use standard deviation to define "anomalous" weather. For example, if a location's average July temperature is 25°C with a std dev of 3°C, temperatures above 31°C (> 2σ) can be classified as "significantly above normal."

Sports Analytics

The standard deviation of an athlete's performance reflects consistency. A runner averaging 10.2s in the 100m with std dev 0.1s is more reliable than one averaging 10.1s with std dev 0.5s—though the latter may have a higher peak performance.

Variance Calculator — variance is the square of standard deviation; explore its properties and derivation
Mean / Median / Mode Calculator — measures of central tendency, complementing standard deviation (dispersion)
Normal Distribution Calculator — use mean and standard deviation to compute normal probabilities and quantiles
Percentile Calculator — understand data position in combination with standard deviation
Z-Score Calculator — standardize data into standard deviation units

Frequently Asked Questions

What does a standard deviation of 0 mean?

A standard deviation of 0 means all values in the dataset are identical—there is no variation whatsoever. Every data point equals the mean, and all deviations are zero. For example, the dataset {5, 5, 5, 5} has a standard deviation of 0.

Can the standard deviation be larger than the mean?

Absolutely. This often occurs with highly skewed data (e.g., income distributions). When CV > 100%, the standard deviation exceeds the mean. This indicates extremely high relative variability. For example: {1, 1, 1, 1, 100} has a mean of about 20.8 but a standard deviation of about 44.3.

When should I use population vs sample standard deviation?

Use the population standard deviation (σ) if your data includes every individual you care about. Example: grades of all 30 students in a class. Use the sample standard deviation (s) if your data is a subset of a larger population and you want to infer population characteristics. Example: surveying 500 consumers to infer national behavior. In practice, sample standard deviation is appropriate in the vast majority of cases, as we are almost always working with sample data.

Is standard deviation sensitive to outliers?

Yes, very much so. Because standard deviation is based on squared deviations, the effect of outliers is amplified. For example, {10, 12, 11, 13, 11} has a sample std dev of about 1.14; adding an outlier to get {10, 12, 11, 13, 11, 100} makes it jump to about 35.8—over 30 times larger. If outliers are present, consider more robust measures of dispersion such as the Median Absolute Deviation (MAD) or the Interquartile Range (IQR).

What is the difference between standard deviation and standard error (SEM)?

Standard deviation measures the dispersion of individual data points; standard error (SEM = s/√n) measures the uncertainty of the sample mean. SEM is always smaller than SD (for n > 1) and decreases as sample size grows. Intuitively: even if individual variation is large (large SD), with a large enough sample, our estimate of the mean can still be precise (small SEM). Error bars in scientific papers should specify whether they show SD or SEM, as they have very different meanings.