Variance Calculator
What Is Variance?
Variance is a statistical measure of how far data points are spread out from their mean, defined as the average of the squared deviations. A larger variance indicates more dispersion; a variance of zero means all data points are identical.
Historical background: The concept of variance was formally introduced by the English statistician and geneticist Ronald A. Fisher (1890–1962) in his landmark 1918 paper "The Correlation Between Relatives on the Supposition of Mendelian Inheritance." In this paper, Fisher needed a precise way to quantify the variation of genetic traits, so he coined the term "variance" and defined it as the mean of the squared deviations.
Fisher chose the word "variance" deliberately—it derives from the Latin variare (to change), concisely conveying the concept of "degree of variation." Before this, statisticians had used verbose phrases like "mean squared deviation."
Intuition
Consider two classes with math scores: Class A scores {70, 72, 68, 71, 69} and Class B scores {40, 100, 60, 90, 60}. Both classes have the same mean (70), but Class B has a much larger variance—scores are "more spread out." Variance precisely quantifies this degree of spread.
Why Are Deviations Squared?
This is one of the most commonly asked questions. Why not just use absolute values of deviations? Why square them? There are three deep reasons:
Reason 1: Avoiding cancellation of positives and negatives
Reason 2: Mathematical convenience
The square function f(x) = x² is a smooth, everywhere-differentiable function, making variance very easy to work with in calculus and optimization. By contrast, the absolute value function |x| is not differentiable at x = 0, complicating many mathematical derivations.
For example, Least Squares minimizes the sum of squared residuals rather than the sum of absolute residuals precisely because differentiating the squared sum yields an analytic solution (the normal equations), while optimizing the absolute sum has no closed-form solution.
Reason 3: Variance of independent variables is additive
Alternative: Mean Absolute Deviation (MAD)
Using absolute values instead of squares gives the "Mean Absolute Deviation" (MAD). MAD is more robust to outliers and preferred in some applications (e.g., median regression). However, due to the lack of the mathematical properties above, variance and standard deviation remain dominant in statistics.
Population vs Sample Variance — Bessel's Correction Proof
| Population Variance (ฯยฒ) | Sample Variance (sยฒ) | |
|---|---|---|
| Formula | ฯยฒ = ฮฃ(xi - ฮผ)ยฒ / N | sยฒ = ฮฃ(xi - xฬ)ยฒ / (n-1) |
| When to use | Data includes all individuals | Data is a subset of the population |
| Divide by | N | n - 1 |
| Bias | Exact value, no bias | Unbiased estimator (Bessel's correction) |
Why divide by n-1? Here is the rigorous mathematical proof:
Proof: rewrite xi - xฬ as (xi - ฮผ) - (xฬ - ฮผ), expand the square, simplify using ฮฃ(xi - ฮผ) = n(xฬ - ฮผ).
Right side, term 1: E[ฮฃ(xi - ฮผ)ยฒ] = nฯยฒ (since E[(xi - ฮผ)ยฒ] = ฯยฒ for each i)
Right side, term 2: E[n(xฬ - ฮผ)ยฒ] = nVar(xฬ) = n ยท ฯยฒ/n = ฯยฒ
Dividing by (n-1) produces an estimator whose expected value exactly equals the population variance ฯยฒ—an unbiased estimator.
Dividing by n instead gives E[ฮฃ(xi - xฬ)ยฒ / n] = (n-1)ฯยฒ/n < ฯยฒ, systematically underestimating the population variance.
Degrees of freedom interpretation: After computing the mean xฬ from n data points, only n-1 values are free to vary (the last one is uniquely determined by Σxi = nxฬ). This "lost degree of freedom" is exactly the 1 we subtract from the denominator.
Friedrich Bessel (1784–1846) was a German astronomer, famous for the first accurate measurement of stellar parallax. In the 1820s, while studying errors in astronomical observations, he first recognized the need to use n-1 rather than n as the denominator to obtain a fair estimate of the true measurement error.
Properties of Variance
Variance is a cornerstone of statistics because of its elegant mathematical properties. Below, each property is explained with its intuitive meaning:
Property 1: Translation invariance
Property 2: Scaling squares the factor
Property 3: Combined (linear transform)
Property 4: Additivity for independent variables
Practical application: If a portfolio contains two independent assets, the portfolio's risk (variance) is the simple sum of the individual risks. This is the foundation of Modern Portfolio Theory (Harry Markowitz, 1952).
Property 5: General case (not independent)
Property 6: Computational shortcut formula
Variance Decomposition & ANOVA
Variance can be decomposed into contributions from different sources. This idea is the core of ANOVA (Analysis of Variance), systematically developed by Fisher in his 1925 book Statistical Methods for Research Workers.
Total Variation = Between-group Variation + Within-group Variation
- SSTotal (Total Sum of Squares): sum of squared deviations of all data points from the grand mean
- SSBetween (Between-group SS): sum of squared deviations of group means from the grand mean (weighted by group size)—reflects differences between groups
- SSWithin (Within-group SS): sum of squared deviations of data points from their own group mean—reflects random variation within groups
Core logic of ANOVA: If between-group variation is much larger than within-group variation (large F-value), the differences between groups are unlikely to be due to chance, and we have reason to believe the groups come from different populations (i.e., the treatment has an effect).
Example
Testing the effect of 3 fertilizers on tomato yield: randomly assign 30 plants to 3 groups with different fertilizers. ANOVA decomposes total yield variation into "differences caused by fertilizer type" (between) and "natural variation among plants within the same fertilizer group" (within). If the F-test p-value < 0.05, at least one fertilizer has a significantly different effect.
ANOVA extends to multiple factors (two-way ANOVA, MANOVA) and more complex experimental designs. The philosophical idea of variance decomposition—breaking total variation into explainable components—is a cornerstone of all modern statistics.
Bias-Variance Tradeoff (Machine Learning)
In machine learning, model prediction error can be decomposed into three components:
Bias
Systematic error in predictions. High bias = underfitting (model too simple to capture true patterns)
Variance
Sensitivity to training data changes. High variance = overfitting (model too complex, treating noise as signal)
Irreducible Error (ฯยฒ)
Random noise inherent in the data that no model can eliminate
The core tradeoff: Reducing bias typically requires a more complex model (e.g., higher polynomial degree, deeper neural network), which tends to increase variance. Conversely, simplifying the model reduces variance but may increase bias. The optimal model balances both to minimize total error.
Practical strategies:
- Regularization (L1/L2): penalizes model complexity to reduce variance, at the cost of slightly higher bias
- Cross-validation: estimates true generalization error by evaluating on different data subsets
- Ensemble methods (Bagging, Random Forest): reduce variance by averaging multiple high-variance models
- Boosting (XGBoost, AdaBoost): reduce bias by iteratively correcting errors
Related Tools
- Standard Deviation Calculator — the square root of variance, in original units for intuitive interpretation
- Mean / Median / Mode Calculator — measures of central tendency, complementing variance (dispersion)
- Normal Distribution Calculator — the normal distribution is fully determined by its mean and variance
- Percentile Calculator — non-parametric approach to describing data position
- Z-Score Calculator — standardize data using standard deviation
Frequently Asked Questions
Variance is measured in the square of the original data units. For example, if data is in kilograms (kg), variance is in kg². This is a drawback of variance—its units are not intuitive. This is why standard deviation (the square root of variance) is often preferred, as it shares the same units as the original data.
Never. Variance is the average of squared deviations, and squared values are always ≥ 0, so variance is always ≥ 0. Variance equals 0 if and only if all data points are identical. If your calculation produces a negative result, there is an error in the computation.
Variance is a special case of covariance: Var(X) = Cov(X, X). Covariance Cov(X, Y) = E[(X - E[X])(Y - E[Y])] measures the direction and strength of co-movement between two variables. When Y = X, covariance reduces to variance. Normalizing covariance (dividing by both standard deviations) gives the correlation coefficient: r = Cov(X,Y) / (SD(X) · SD(Y)), which ranges from [-1, 1].
Population variance is 0 when N=1 (the single point is the mean, deviation is zero). However, sample variance is undefined when n=1 because the denominator n-1 = 0, causing division by zero. This is also intuitively sensible: with only one observation, we have absolutely no information about the population's variability—a single point cannot reveal anything about "spread." At least 2 data points are needed to compute sample variance.
In Excel: VAR (or VAR.S) computes sample variance (divides by n-1); VARP (or VAR.P) computes population variance (divides by N). Similarly, STDEV/STDEV.S is sample standard deviation, STDEVP/STDEV.P is population standard deviation. Google Sheets and LibreOffice Calc follow the same naming convention. Mnemonic: the "P" stands for Population; no "P" means Sample.