What is Pearson's r: Measuring Linear Dependence Between Two Variables?
Mathematical Foundation
Laws & Principles
- Correlation ≠ Causation: The single most critical caveat in all of statistics. A perfect r = 0.99 between two variables does NOT prove one causes the other. Both could be driven by an unseen confounding variable. Ice cream sales and drowning deaths are highly correlated — the hidden variable is summer heat.
- r² (Coefficient of Determination): Squaring Pearson's r gives the proportion of variance in Y that is linearly explained by X. If r = 0.80, then r² = 0.64, meaning 64% of Y's variation is accounted for by X. The remaining 36% is unexplained (noise, nonlinearity, or other variables).
- Linearity Assumption: Pearson's r only detects linear relationships. A perfect parabolic curve (Y = X²) can produce r ≈ 0 despite a strong, deterministic relationship. For non-linear data, use Spearman's rank correlation instead.
Step-by-Step Example Walkthrough
" A researcher collects 5 paired observations of study hours (X) and exam scores (Y): (2,65), (4,72), (6,80), (8,85), (10,92). "
- 1. Compute sums: Σx = 30, Σy = 394, Σxy = 2494, Σx² = 220, Σy² = 31458, n = 5.
- 2. Numerator: n·Σxy - Σx·Σy = 5(2494) - 30(394) = 12470 - 11820 = 650.
- 3. Denominator: √[(5·220 - 900)(5·31458 - 155236)] = √[(200)(1054)] = √210800 = 459.13.
- 4. r = 650 / 459.13 = 0.9955.