Hassan Ijaz
Ai, Web & Design
Correlation and causation
Scatter plot generator with hidden confounders that users must discover, showing how correlation can be misleading
Concept Overview
Understanding the difference between correlation and causation is crucial for proper data interpretation. While correlation measures how variables move together, causation implies that one variable directly influences another.
Correlation
Correlation measures the strength and direction of a linear relationship between two variables.
r = Cov(X,Y) / (σ_X × σ_Y)
Pearson correlation coefficient: -1 ≤ r ≤ 1
- r = 1: Perfect positive linear relationship
- r = 0: No linear relationship
- r = -1: Perfect negative linear relationship
Correlation ≠ Causation
Common reasons for non-causal correlations:
- Confounding Variables: A third variable affects both
- Reverse Causation: Y causes X, not X causes Y
- Coincidence: Random chance in finite samples
- Selection Bias: Non-representative sampling
Classic Examples
Ice Cream Sales vs. Drownings
Both increase in summer (confounded by temperature)
Shoe Size vs. Reading Ability
Both increase with age in children (confounded by age)
Nobel Prizes vs. Chocolate Consumption
Countries correlate on both (confounded by wealth/education)
Establishing Causation
Bradford Hill criteria for causation:
- Temporal precedence: Cause must precede effect
- Strength: Stronger associations more likely causal
- Consistency: Replicated across studies
- Specificity: Specific cause → specific effect
- Biological gradient: Dose-response relationship
- Plausibility: Makes sense theoretically
Methods for Causal Inference
Randomized Experiments
Gold standard - randomly assign treatment
Natural Experiments
Exploit random-like natural variation
Instrumental Variables
Use variable that affects X but not Y directly
Regression Discontinuity
Compare just above/below threshold
Remember: "Correlation does not imply causation" is one of the most important principles in statistics. Always look for alternative explanations before claiming causality!
The interactive visualization below generates scatter plots with hidden confounding variables. Try to discover what's really causing the correlations you see!
Interactive Visualization
Loading interactive visualization...
Scatter plot generator with hidden confounders that users must discover, showing how correlation can be misleading