Pearson Correlation Coefficient (r) Calculator

What is Pearson's r: Measuring Linear Dependence Between Two Variables?

The Pearson correlation coefficient (r) quantifies the strength and direction of a linear relationship between two continuous variables. It is the ratio of their covariance to the product of their standard deviations, which normalizes the result to the range [-1, +1]. A value of +1 means perfect positive linearity, -1 means perfect negative linearity, and 0 means no linear relationship exists.

Mathematical Foundation

Pearson Correlation Coefficient

r = \frac{n\sum xy - (\sum x)(\sum y)}{\sqrt{[n\sum x^2 - (\sum x)^2][n\sum y^2 - (\sum y)^2]}}

r

= Pearson's r — ranges from -1 (perfect negative) to +1 (perfect positive). Zero means no linear relationship.

n

= Number of paired observations in the dataset.

\sum xy

= Sum of the products of each paired (x, y) observation.

\sum x, \sum y

= Sums of the individual x and y values.

Laws & Principles

Correlation ≠ Causation: The single most critical caveat in all of statistics. A perfect r = 0.99 between two variables does NOT prove one causes the other. Both could be driven by an unseen confounding variable. Ice cream sales and drowning deaths are highly correlated — the hidden variable is summer heat.
r² (Coefficient of Determination): Squaring Pearson's r gives the proportion of variance in Y that is linearly explained by X. If r = 0.80, then r² = 0.64, meaning 64% of Y's variation is accounted for by X. The remaining 36% is unexplained (noise, nonlinearity, or other variables).
Linearity Assumption: Pearson's r only detects linear relationships. A perfect parabolic curve (Y = X²) can produce r ≈ 0 despite a strong, deterministic relationship. For non-linear data, use Spearman's rank correlation instead.

Step-by-Step Example Walkthrough

" A researcher collects 5 paired observations of study hours (X) and exam scores (Y): (2,65), (4,72), (6,80), (8,85), (10,92). "

1. Compute sums: Σx = 30, Σy = 394, Σxy = 2494, Σx² = 220, Σy² = 31458, n = 5.
2. Numerator: n·Σxy - Σx·Σy = 5(2494) - 30(394) = 12470 - 11820 = 650.
3. Denominator: √[(5·220 - 900)(5·31458 - 155236)] = √[(200)(1054)] = √210800 = 459.13.
4. r = 650 / 459.13 = 0.9955.

Final Result: r = 0.9955 indicates a near-perfect positive linear relationship. r² = 0.991, meaning 99.1% of the variance in exam scores is explained by study hours in this sample.

Correlation Strength Classification (Reference)

Standard interpretation guidelines for |r| values (Cohen, 1988).

\|r\| Range	Strength	r² Interpretation	Example
0.90 – 1.00	Very Strong	81–100% explained	Height vs. shoe size
0.70 – 0.89	Strong	49–79% explained	SAT scores vs. GPA
0.40 – 0.69	Moderate	16–48% explained	Exercise vs. weight loss
0.00 – 0.39	Weak / None	0–15% explained	Shoe size vs. IQ

Research Use Cases

Clinical Research

Medical researchers use Pearson's r to quantify whether a biomarker (e.g., blood pressure) is linearly associated with a patient outcome (e.g., stroke risk). A strong correlation justifies further causal investigation through controlled trials.

Financial Portfolio Analysis

Portfolio managers calculate r between asset returns. If two stocks have r ≈ +1, they move together (no diversification benefit). If r ≈ -1, they hedge each other. Modern Portfolio Theory directly uses correlation matrices to optimize risk-adjusted returns.

Correlation Analysis Best Practices (Pro Tips)

Do This

✓Always plot your data first. Anscombe's Quartet famously demonstrated four completely different datasets that all produce identical r values. A scatter plot instantly reveals nonlinearity, outliers, and clustering that r alone cannot detect.

Avoid This

✗Don't use Pearson's r on ordinal or ranked data. Pearson's r assumes both variables are continuous and normally distributed. For Likert scales (1-5 ratings), survey ranks, or any ordinal data, use Spearman's rank correlation (ρ) instead.

Frequently Asked Questions

What is the difference between r and r²?

r measures both direction (+/-) and strength of a linear relationship. r² (the coefficient of determination) tells you what percentage of the variation in Y is explained by X. If r = 0.80, then r² = 0.64 — meaning 64% of Y's variability is accounted for by X.

How many data points do I need?

Mathematically, you need at least 3 data pairs. Practically, 30+ pairs are recommended for reliable results. With very few data points, a single outlier can dramatically swing r from near-zero to near-one, making the result misleading.

Can Pearson's r detect curved relationships?

No. Pearson's r specifically measures linear relationships. A perfect U-shaped curve (Y = X²) can produce r ≈ 0. For non-linear relationships, use Spearman's rank correlation or visual methods like scatter plots and residual analysis.

Does correlation prove causation?

Never. Correlation identifies statistical association, not causation. Even r = 0.99 between two variables could be caused by a hidden third variable (confounding). Establishing causation requires controlled experiments, not observational correlation.

Pearson Correlation Engine

Data Stats: Pearson Correlation

Dynamic Input Coordinates

Pearson Correlation (r)