Hassan Ijaz

Ai, Web & Design
← Back to all topics
Descriptive StatisticsTopic 11 of 58

Outlier detection

Interactive scatter plot where users click to add points and see various outlier detection methods highlight anomalies in real-time

Concept Overview

Outliers are data points that differ significantly from other observations. Detecting them is crucial as they can indicate data quality issues, interesting anomalies, or rare but important events.

Statistical Methods

Z-Score Method

z = (x - μ) / σ

Points with |z| > 3 are often considered outliers

Assumes normal distribution

IQR Method (Tukey Fences)

Lower fence: Q1 - 1.5 × IQR

Upper fence: Q3 + 1.5 × IQR

Robust to distribution shape

Modified Z-Score (MAD)

M = 0.6745 × (x - median) / MAD

MAD = median(|x - median|)

More robust than standard z-score

Machine Learning Methods

Isolation Forest

Isolates anomalies using random trees

Local Outlier Factor

Compares local density to neighbors

DBSCAN

Density-based clustering approach

One-Class SVM

Learns boundary of normal data

Types of Outliers

Point Outliers

Individual data points far from others

Contextual Outliers

Normal globally but abnormal in specific context

Collective Outliers

Groups of data points that together are anomalous

Handling Outliers

  • Investigate: Understand why they exist
  • Keep: If they represent valid extreme cases
  • Remove: If they're errors or irrelevant
  • Transform: Use robust methods or transformations
  • Cap/Winsorize: Replace with less extreme values

Important: Outliers aren't always bad! They might represent:

  • Fraud in financial transactions
  • Rare diseases in medical data
  • Equipment failures in sensor data
  • Breakthrough performances in sports

Click on the scatter plot below to add data points. Watch as different outlier detection methods highlight anomalies in real-time, each using different criteria.

Interactive Visualization

Loading interactive visualization...

Interactive scatter plot where users click to add points and see various outlier detection methods highlight anomalies in real-time