Hassan Ijaz

Ai, Web & Design
← Back to all topics
Regression & ModelingTopic 22 of 58

Linear regression

Drag-and-drop interface where users place points and see regression line update, with residual visualization and R-squared display

Concept Overview

Linear regression models the relationship between a continuous response variable and one or more predictor variables using a linear equation. It's one of the most fundamental and widely used statistical techniques.

The Linear Model

Simple: y = β₀ + β₁x + ε

One predictor variable

Multiple: y = β₀ + β₁x₁ + β₂x₂ + ... + βₚxₚ + ε

Multiple predictors

  • β₀: Intercept (y-value when all x = 0)
  • βᵢ: Slope coefficients (change in y per unit change in xᵢ)
  • ε: Random error term (ε ~ N(0, σ²))

Key Assumptions

Linearity

Relationship between x and y is linear

Independence

Observations are independent of each other

Homoscedasticity

Constant variance of errors across all levels of x

Normality

Errors are normally distributed

Estimation: Least Squares

Minimize sum of squared residuals:

minimize: Σ(yᵢ - ŷᵢ)²

  • Closed-form solution: β̂ = (X'X)⁻¹X'y
  • Unique solution when X'X is invertible
  • BLUE: Best Linear Unbiased Estimator
  • Minimizes variance among unbiased estimators

Model Evaluation

R-squared (R²)

R² = 1 - SS_res/SS_tot

  • Proportion of variance explained by model
  • Range: 0 to 1 (higher is better)
  • Adjusted R² penalizes for additional predictors

Residual Analysis

  • Plot residuals vs fitted values (check homoscedasticity)
  • Q-Q plot of residuals (check normality)
  • Cook's distance (identify influential points)
  • Leverage (identify outliers in x-space)

Inference

Test individual coefficients:

t = β̂ᵢ / SE(β̂ᵢ)

Test overall model significance:

F = (SS_reg/p) / (SS_res/(n-p-1))

Key Insight: Linear regression is interpretable and provides uncertainty quantification, but assumes linear relationships. Consider transformations or non-linear methods for complex patterns.

The drag-and-drop interface below lets you place points and see the regression line update in real-time. Watch residuals and R-squared change as you modify the data.

Interactive Visualization

Loading interactive visualization...

Drag-and-drop interface where users place points and see regression line update, with residual visualization and R-squared display