Hassan Ijaz

Ai, Web & Design
← Back to all topics
Descriptive StatisticsTopic 10 of 58

Correlation and causation

Scatter plot generator with hidden confounders that users must discover, showing how correlation can be misleading

Concept Overview

Understanding the difference between correlation and causation is crucial for proper data interpretation. While correlation measures how variables move together, causation implies that one variable directly influences another.

Correlation

Correlation measures the strength and direction of a linear relationship between two variables.

r = Cov(X,Y) / (σ_X × σ_Y)

Pearson correlation coefficient: -1 ≤ r ≤ 1

  • r = 1: Perfect positive linear relationship
  • r = 0: No linear relationship
  • r = -1: Perfect negative linear relationship

Correlation ≠ Causation

Common reasons for non-causal correlations:

  1. Confounding Variables: A third variable affects both
  2. Reverse Causation: Y causes X, not X causes Y
  3. Coincidence: Random chance in finite samples
  4. Selection Bias: Non-representative sampling

Classic Examples

Ice Cream Sales vs. Drownings

Both increase in summer (confounded by temperature)

Shoe Size vs. Reading Ability

Both increase with age in children (confounded by age)

Nobel Prizes vs. Chocolate Consumption

Countries correlate on both (confounded by wealth/education)

Establishing Causation

Bradford Hill criteria for causation:

  • Temporal precedence: Cause must precede effect
  • Strength: Stronger associations more likely causal
  • Consistency: Replicated across studies
  • Specificity: Specific cause → specific effect
  • Biological gradient: Dose-response relationship
  • Plausibility: Makes sense theoretically

Methods for Causal Inference

Randomized Experiments

Gold standard - randomly assign treatment

Natural Experiments

Exploit random-like natural variation

Instrumental Variables

Use variable that affects X but not Y directly

Regression Discontinuity

Compare just above/below threshold

Remember: "Correlation does not imply causation" is one of the most important principles in statistics. Always look for alternative explanations before claiming causality!

The interactive visualization below generates scatter plots with hidden confounding variables. Try to discover what's really causing the correlations you see!

Interactive Visualization

Loading interactive visualization...

Scatter plot generator with hidden confounders that users must discover, showing how correlation can be misleading