Variance Calculator

Quick examples:

What Is Variance?

Variance is a statistical measure of how far data points are spread out from their mean, defined as the average of the squared deviations. A larger variance indicates more dispersion; a variance of zero means all data points are identical.

ฯƒยฒ = 1N ฮฃi=1N (xi - ฮผ)ยฒ

Historical background: The concept of variance was formally introduced by the English statistician and geneticist Ronald A. Fisher (1890–1962) in his landmark 1918 paper "The Correlation Between Relatives on the Supposition of Mendelian Inheritance." In this paper, Fisher needed a precise way to quantify the variation of genetic traits, so he coined the term "variance" and defined it as the mean of the squared deviations.

Fisher chose the word "variance" deliberately—it derives from the Latin variare (to change), concisely conveying the concept of "degree of variation." Before this, statisticians had used verbose phrases like "mean squared deviation."

Intuition

Consider two classes with math scores: Class A scores {70, 72, 68, 71, 69} and Class B scores {40, 100, 60, 90, 60}. Both classes have the same mean (70), but Class B has a much larger variance—scores are "more spread out." Variance precisely quantifies this degree of spread.

Why Are Deviations Squared?

This is one of the most commonly asked questions. Why not just use absolute values of deviations? Why square them? There are three deep reasons:

Reason 1: Avoiding cancellation of positives and negatives

Deviations (xi - xฬ„) are positive and negative, and their sum is always exactly zero—this is a mathematical property of the mean: Σ(xi - xฬ„) = 0. If we simply averaged the deviations, the result would always be 0, conveying no information. Squaring converts all deviations to positive values, preventing cancellation.

Reason 2: Mathematical convenience

The square function f(x) = x² is a smooth, everywhere-differentiable function, making variance very easy to work with in calculus and optimization. By contrast, the absolute value function |x| is not differentiable at x = 0, complicating many mathematical derivations.

For example, Least Squares minimizes the sum of squared residuals rather than the sum of absolute residuals precisely because differentiating the squared sum yields an analytic solution (the normal equations), while optimizing the absolute sum has no closed-form solution.

Reason 3: Variance of independent variables is additive

Var(X + Y) = Var(X) + Var(Y) (when X, Y are independent)
This is one of the most important mathematical properties of variance. The "mean absolute deviation" (MAD) does not have this additivity. Variance's additivity allows us to compute the variance of a combined system from its components—essential in portfolio theory, error propagation, and statistical inference.

Alternative: Mean Absolute Deviation (MAD)

Using absolute values instead of squares gives the "Mean Absolute Deviation" (MAD). MAD is more robust to outliers and preferred in some applications (e.g., median regression). However, due to the lack of the mathematical properties above, variance and standard deviation remain dominant in statistics.

Population vs Sample Variance — Bessel's Correction Proof

Population Variance (ฯƒยฒ)Sample Variance (sยฒ)
Formulaฯƒยฒ = ฮฃ(xi - ฮผ)ยฒ / Nsยฒ = ฮฃ(xi - xฬ„)ยฒ / (n-1)
When to useData includes all individualsData is a subset of the population
Divide byNn - 1
BiasExact value, no biasUnbiased estimator (Bessel's correction)

Why divide by n-1? Here is the rigorous mathematical proof:

Key Identity
ฮฃ(xi - xฬ„)ยฒ = ฮฃ(xi - ฮผ)ยฒ - n(xฬ„ - ฮผ)ยฒ
Proof: rewrite xi - xฬ„ as (xi - ฮผ) - (xฬ„ - ฮผ), expand the square, simplify using ฮฃ(xi - ฮผ) = n(xฬ„ - ฮผ).
Take Expectation E[ยท]
Left side: E[ฮฃ(xi - xฬ„)ยฒ] = ? (what we want)
Right side, term 1: E[ฮฃ(xi - ฮผ)ยฒ] = nฯƒยฒ (since E[(xi - ฮผ)ยฒ] = ฯƒยฒ for each i)
Right side, term 2: E[n(xฬ„ - ฮผ)ยฒ] = nVar(xฬ„) = n ยท ฯƒยฒ/n = ฯƒยฒ
Substitute
E[ฮฃ(xi - xฬ„)ยฒ] = nฯƒยฒ - ฯƒยฒ = (n-1)ฯƒยฒ
Conclusion
Therefore E[ฮฃ(xi - xฬ„)ยฒ / (n-1)] = ฯƒยฒ
Dividing by (n-1) produces an estimator whose expected value exactly equals the population variance ฯƒยฒ—an unbiased estimator.
Dividing by n instead gives E[ฮฃ(xi - xฬ„)ยฒ / n] = (n-1)ฯƒยฒ/n < ฯƒยฒ, systematically underestimating the population variance.

Degrees of freedom interpretation: After computing the mean xฬ„ from n data points, only n-1 values are free to vary (the last one is uniquely determined by Σxi = nxฬ„). This "lost degree of freedom" is exactly the 1 we subtract from the denominator.

Friedrich Bessel (1784–1846) was a German astronomer, famous for the first accurate measurement of stellar parallax. In the 1820s, while studying errors in astronomical observations, he first recognized the need to use n-1 rather than n as the denominator to obtain a fair estimate of the true measurement error.

Properties of Variance

Variance is a cornerstone of statistics because of its elegant mathematical properties. Below, each property is explained with its intuitive meaning:

Property 1: Translation invariance

Var(X + b) = Var(X)
Why? Adding a constant b to every data point shifts the entire distribution without changing the relative relationships between points. The mean shifts by b, but each deviation (xi + b) - (xฬ„ + b) = xi - xฬ„ remains unchanged. Intuitively: if every student gets 10 bonus points, the "spread" of scores does not change.

Property 2: Scaling squares the factor

Var(aX) = aยฒยทVar(X)
Why? Multiplying every data point by a constant a also multiplies each deviation by a. Since variance is the average of squared deviations, a gets squared. For example, converting data from meters to centimeters (multiplying by 100) makes variance 10,000 times larger (100²). This is why standard deviation (square root of variance) scales by |a|: SD(aX) = |a|·SD(X).

Property 3: Combined (linear transform)

Var(aX + b) = aยฒยทVar(X)
Combining properties 1 and 2: translation does not affect variance, scaling multiplies variance by the square of the scale factor.

Property 4: Additivity for independent variables

Var(X + Y) = Var(X) + Var(Y) (when X, Y are independent)
Why? Expanding Var(X + Y) = E[(X+Y-E[X+Y])²], the cross term 2·E[(X-E[X])(Y-E[Y])] is 2·Cov(X,Y). When X and Y are independent, covariance is zero, the cross term vanishes, leaving only Var(X) + Var(Y).

Practical application: If a portfolio contains two independent assets, the portfolio's risk (variance) is the simple sum of the individual risks. This is the foundation of Modern Portfolio Theory (Harry Markowitz, 1952).

Property 5: General case (not independent)

Var(X + Y) = Var(X) + Var(Y) + 2ยทCov(X, Y)
When X and Y are not independent, the covariance term must be included. Positive covariance (moving together) increases combined variance; negative covariance (moving opposite) decreases it—this is the mathematical basis for diversification reducing portfolio risk.

Property 6: Computational shortcut formula

Var(X) = E[Xยฒ] - (E[X])ยฒ
This identity, "the expectation of the square minus the square of the expectation," is extremely useful in theoretical derivations. It is sometimes easier to compute than the definition directly.

Variance Decomposition & ANOVA

Variance can be decomposed into contributions from different sources. This idea is the core of ANOVA (Analysis of Variance), systematically developed by Fisher in his 1925 book Statistical Methods for Research Workers.

SSTotal = SSBetween + SSWithin

Total Variation = Between-group Variation + Within-group Variation

Core logic of ANOVA: If between-group variation is much larger than within-group variation (large F-value), the differences between groups are unlikely to be due to chance, and we have reason to believe the groups come from different populations (i.e., the treatment has an effect).

Example

Testing the effect of 3 fertilizers on tomato yield: randomly assign 30 plants to 3 groups with different fertilizers. ANOVA decomposes total yield variation into "differences caused by fertilizer type" (between) and "natural variation among plants within the same fertilizer group" (within). If the F-test p-value < 0.05, at least one fertilizer has a significantly different effect.

ANOVA extends to multiple factors (two-way ANOVA, MANOVA) and more complex experimental designs. The philosophical idea of variance decomposition—breaking total variation into explainable components—is a cornerstone of all modern statistics.

Bias-Variance Tradeoff (Machine Learning)

In machine learning, model prediction error can be decomposed into three components:

E[(y - ลท)ยฒ] = Bias(ลท)ยฒ + Var(ลท) + ฯƒยฒnoise

Bias

Systematic error in predictions. High bias = underfitting (model too simple to capture true patterns)

Variance

Sensitivity to training data changes. High variance = overfitting (model too complex, treating noise as signal)

Irreducible Error (ฯƒยฒ)

Random noise inherent in the data that no model can eliminate

The core tradeoff: Reducing bias typically requires a more complex model (e.g., higher polynomial degree, deeper neural network), which tends to increase variance. Conversely, simplifying the model reduces variance but may increase bias. The optimal model balances both to minimize total error.

Practical strategies:

Frequently Asked Questions

What are the units of variance?

Variance is measured in the square of the original data units. For example, if data is in kilograms (kg), variance is in kg². This is a drawback of variance—its units are not intuitive. This is why standard deviation (the square root of variance) is often preferred, as it shares the same units as the original data.

Can variance be negative?

Never. Variance is the average of squared deviations, and squared values are always ≥ 0, so variance is always ≥ 0. Variance equals 0 if and only if all data points are identical. If your calculation produces a negative result, there is an error in the computation.

What is the relationship between variance and covariance?

Variance is a special case of covariance: Var(X) = Cov(X, X). Covariance Cov(X, Y) = E[(X - E[X])(Y - E[Y])] measures the direction and strength of co-movement between two variables. When Y = X, covariance reduces to variance. Normalizing covariance (dividing by both standard deviations) gives the correlation coefficient: r = Cov(X,Y) / (SD(X) · SD(Y)), which ranges from [-1, 1].

What about n=1 (only one data point)?

Population variance is 0 when N=1 (the single point is the mean, deviation is zero). However, sample variance is undefined when n=1 because the denominator n-1 = 0, causing division by zero. This is also intuitively sensible: with only one observation, we have absolutely no information about the population's variability—a single point cannot reveal anything about "spread." At least 2 data points are needed to compute sample variance.

What is the difference between VAR and VARP in Excel?

In Excel: VAR (or VAR.S) computes sample variance (divides by n-1); VARP (or VAR.P) computes population variance (divides by N). Similarly, STDEV/STDEV.S is sample standard deviation, STDEVP/STDEV.P is population standard deviation. Google Sheets and LibreOffice Calc follow the same naming convention. Mnemonic: the "P" stands for Population; no "P" means Sample.