Mean, Median & Mode

Enter numbers (comma-separated)

What Are Measures of Central Tendency?

Imagine you have a dataset: exam scores of 100 students, daily temperatures for the past month, or salaries of every employee at a company. These raw numbers could be dozens or millions of values — you need a way to summarize where the "center" of the dataset lies with a single representative value. That is the core goal of measures of central tendency.

Why do we need three different measures? Because no single number can perfectly capture every aspect of a data distribution. Each measure defines "center" from a different mathematical perspective:

Mean — minimizes the sum of squared distances from all data points to the center. It uses every value in the dataset but is sensitive to extreme values.
Median — minimizes the sum of absolute distances from all data points to the center. It is robust to outliers and more representative in skewed distributions.
Mode — the most frequently occurring value. It is the only measure of central tendency applicable to categorical data (e.g., colors, brands).

Understanding the differences between these three — and when to use each — is one of the foundational skills in data analysis, statistics, and machine learning.

The Mean (Arithmetic Average)

The mean is the most commonly used measure of central tendency. Its calculation is straightforward: add all values together, then divide by the count.

x̄ = (x₁ + x₂ + … + x_n) / n = (∑x_i) / n

历史渊源

求平均值的概念可以追溯到古巴比伦时代（约公元前 300 年），天文学家们使用多次观测的平均值来提高天体位置预测的精度。在近代，卡尔·弗里德里希·高斯（Carl Friedrich Gauss）和阿德里安-马里·勒让德（Adrien-Marie Legendre）在 19 世纪初将其正式化为最小二乘法的核心概念，为现代统计学奠定了基础。

为什么均值有效：数学证明

均值并非任意定义——它是使所有数据点到"中心"的平方偏差之和最小化的那个唯一值。这可以通过微积分证明：

Objective

最小化 f(c) = ∑(x_i − c)²

Differentiate

df/dc = ∑ 2(x_i − c)(−1) = −2 ∑(x_i − c)

Set to zero

−2 ∑(x_i − c) = 0 ⇒ ∑x_i − nc = 0 ⇒ c = (∑x_i) / n = x̄

这就是为什么均值在回归分析和最小二乘拟合中如此重要——它天然地与"最小化平方误差"的目标绑定在一起。

均值的变体

Weighted Mean

x̄_w = ∑(w_i · x_i) / ∑w_i

Why it exists: Used when observations have different importance. For example, GPA calculation weights courses by credit hours; portfolio returns weight assets by allocation percentage.

Geometric Mean

GM = (∏ x_i)^1/n = (x₁ · x₂ · … · x_n)^1/n

Why it exists: Ideal for ratios and growth rates. If an investment grows 10% in year 1, 20% in year 2, and drops 5% in year 3, the geometric mean gives a more accurate compound annual growth rate than the arithmetic mean. Cauchy proved that GM ≤ AM always (the AM-GM inequality).

Harmonic Mean

HM = n / ∑(1/x_i)

Why it exists: Used for averaging rates and ratios. Classic example: a car travels a distance at 60 km/h and returns at 40 km/h — the average speed is not 50 km/h (arithmetic mean) but 48 km/h (harmonic mean), because more time is spent at the slower speed. Also used for averaging price-earnings ratios.

Limitation of the Mean

The mean is highly sensitive to outliers. The reason is straightforward: the mean uses the exact numerical value of every data point — a single extreme value directly shifts the sum and pulls the mean away from where most data cluster.

Practical examples: GPA and temperature

Suppose 5 students score: 82, 85, 88, 90, 91. Mean = 87.2 — a good representation of the center.

But add one student who scored 20: the set becomes 20, 82, 85, 88, 90, 91. The mean drops to 76.0 — a value that does not represent most students' performance. Here the median of 86.5 is far more representative.

The Median

The median is the value that splits a sorted dataset into two equal halves. If the number of data points is odd, the median is the middle value; if even, it is the average of the two middle values.

Odd n: Median = x_(n+1)/2
Even n: Median = (x_n/2 + x_n/2+1) / 2

Historical origin

Francis Galton introduced and popularized the median in statistics in 1881. While the intuitive concept of a "middle value" is much older, Galton formally demonstrated its theoretical value as a measure of central tendency, especially its superiority when dealing with skewed data.

Why the median exists: mathematical intuition

Just as the mean minimizes the sum of squared deviations, the median minimizes the sum of absolute deviations:

Median optimality

The median m minimizes ∑|x_i − m|

This is the fundamental reason the median is insensitive to outliers: the absolute value function (unlike squaring) does not amplify the influence of extreme values.

Why income and house prices use the median

Income and house price distributions are typically right-skewed: most values cluster at the low-to-middle range while a few extremely high values pull the mean far upward. This is why economists and government statistics agencies report median income rather than mean income — it more accurately reflects the economic situation of the "typical" citizen.

Bill Gates walks into a bar

A bar has 10 people, each earning about $50,000/year. Mean = Median ≈ $50,000.

Now Bill Gates walks in (net worth ~$100 billion). The mean salary jumps to roughly $9.1 billion — about 180,000x the original. But the median remains around $50,000, barely changed.

This classic example perfectly illustrates why the median should be used in the presence of extreme outliers.

The Mode

The mode is the value that occurs most frequently in a dataset. Unlike the mean and median, the mode does not depend on numerical magnitude — only on frequency of occurrence.

Mode = argmax_v count(v), i.e., the value with the highest frequency

Historical origin

Karl Pearson coined the term "mode" in 1895 (from the French la mode, meaning "fashion" or "trend" — i.e., "the most fashionable value"). Pearson was one of the founders of modern statistics, also responsible for the chi-squared test, correlation coefficient, and many other core concepts.

Why the mode exists

The mode is the only measure of central tendency applicable to categorical (nominal) data. You cannot calculate the "mean" or "median" of colors, but you can say "the most common color is blue" — that is the mode.

Unimodal, bimodal, and multimodal distributions

Unimodal: one mode — data comes from a single population. Example: heights of adult males.
Bimodal: two modes — usually indicates the data is a mixture of two distinct populations. Example: height data for all adults (males and females each form a peak).
Multimodal: more than two modes — may indicate multiple subgroups or discrete preference categories.

Discovering that data is multimodal is often more informative than the mode value itself — it signals that distinct subpopulations may exist and should be analyzed separately.

When to Use Which Measure?

Choosing the right measure of central tendency depends on the type and distribution shape of your data. The following decision table can guide your choice:

Situation	Best Measure	Why
Symmetric numerical data	Mean	Mean = Median = Mode here; the mean uses the most information
Skewed data (income, house prices)	Median	Robust to outliers, reflects the "typical" value
Categorical data (colors, brands)	Mode	The only option for non-numerical data
Growth rates, investment returns	Geometric mean	Correctly handles compounding; does not overestimate annual returns
Rates and ratios (speed, P/E ratio)	Harmonic mean	Correctly averages "per-unit" quantities
Has outliers but you do not want to discard them entirely	Trimmed mean	Remove top and bottom percentages, then average — balances robustness and information
Need to identify subgroups in data	Mode	Multimodal distributions reveal mixture populations

Quick decision rule

Step 1: Is the data numerical or categorical? If categorical → use the mode.

Step 2: Is the distribution symmetric? If yes → use the mean.

Step 3: Is there obvious skewness or outliers? → use the median.

Relationship Between Mean, Median, and Mode

The relative positions of the three measures depend on the skewness of the distribution. Understanding this relationship lets you quickly infer the shape of a distribution just by comparing the three values.

Symmetric distribution

🏛

Mean = Median = Mode

The normal distribution is the classic example. All three measures coincide at the center.

Right-skewed (positive skew)

➡

Mode < Median < Mean

The long right tail pulls the mean to the right. Typical examples: income distribution, house prices.

Left-skewed (negative skew)

⬅

Mean < Median < Mode

The long left tail pulls the mean to the left. Typical examples: retirement age, exam scores on a hard test.

Pearson's empirical rule

Karl Pearson proposed an approximate relationship linking all three:

Mean − Mode ≈ 3 (Mean − Median)

Equivalently: Mode ≈ 3 × Median − 2 × Mean

This is an approximation that holds for moderately skewed unimodal distributions. It may be inaccurate for heavily skewed or multimodal distributions, but it is remarkably useful as a quick estimation tool — if you know the mean and median, you can roughly estimate where the mode lies.

Standard Deviation Calculator

Calculate population and sample standard deviation, variance

Variance Calculator

Detailed population and sample variance calculation

Correlation Coefficient Calculator

Pearson and Spearman correlation coefficient

Probability Calculator

Event probability and conditional probability

Frequently Asked Questions

What is the difference between "mean" and "average"?

In everyday language, "average" and "mean" usually refer to the same thing — the arithmetic mean. However, in strict statistical terminology, "average" is a broader concept that can include the arithmetic mean, geometric mean, harmonic mean, median, or even the mode. "Mean" typically refers specifically to the arithmetic mean x̄ = ∑x_i / n.

What is the mode if all values occur equally often?

If every value in the dataset occurs the same number of times (e.g., 1, 2, 3, 4, 5 each appearing once), then there is no mode (the distribution is called "amodal"). Some textbooks say "all values are modes," but this is not practically meaningful. This calculator lists all values that share the highest frequency.

When are the mean and median equal?

The mean and median are equal when the distribution is perfectly symmetric. The normal (Gaussian) distribution is the most common example. The uniform distribution also satisfies this. If you find a large gap between mean and median, it typically indicates skewness or outliers — in such cases, the median is the better choice for describing a "typical" value.

Can you use mean, median, and mode to check if data is normally distributed?

It can serve as a rough preliminary check, but not as a formal test. If mean ≈ median ≈ mode, the data may be symmetric (but not necessarily normal — a uniform distribution also satisfies this). Formal normality tests require Shapiro-Wilk, Kolmogorov-Smirnov, or Q-Q plots. The differences among the three primarily indicate the direction and degree of skewness.

How do variance and standard deviation relate to central tendency?

Measures of central tendency (mean, median, mode) describe the center of the data, while variance and standard deviation describe how spread out the data is around that center. The two are complementary: knowing only that the mean is 50 does not tell you whether the data cluster between 48 and 52 or are spread across 0 to 100. Standard deviation is the square root of variance and has the same units as the original data, making it more practical. This calculator computes both sets of measures.

Mean, Median & Mode

What Are Measures of Central Tendency?

The Mean (Arithmetic Average)

历史渊源

为什么均值有效：数学证明

均值的变体

Weighted Mean

Geometric Mean

Harmonic Mean

Limitation of the Mean

Practical examples: GPA and temperature

The Median

Historical origin

Why the median exists: mathematical intuition

Why income and house prices use the median

Bill Gates walks into a bar

The Mode

Historical origin

Why the mode exists

Unimodal, bimodal, and multimodal distributions

When to Use Which Measure?

Quick decision rule

Relationship Between Mean, Median, and Mode

Symmetric distribution

Right-skewed (positive skew)

Left-skewed (negative skew)

Pearson's empirical rule

Related Statistics Tools

Frequently Asked Questions

Related Tools