Hassan Ijaz
Ai, Web & Design
Outlier detection
Interactive scatter plot where users click to add points and see various outlier detection methods highlight anomalies in real-time
Concept Overview
Outliers are data points that differ significantly from other observations. Detecting them is crucial as they can indicate data quality issues, interesting anomalies, or rare but important events.
Statistical Methods
Z-Score Method
z = (x - μ) / σ
Points with |z| > 3 are often considered outliers
Assumes normal distribution
IQR Method (Tukey Fences)
Lower fence: Q1 - 1.5 × IQR
Upper fence: Q3 + 1.5 × IQR
Robust to distribution shape
Modified Z-Score (MAD)
M = 0.6745 × (x - median) / MAD
MAD = median(|x - median|)
More robust than standard z-score
Machine Learning Methods
Isolation Forest
Isolates anomalies using random trees
Local Outlier Factor
Compares local density to neighbors
DBSCAN
Density-based clustering approach
One-Class SVM
Learns boundary of normal data
Types of Outliers
Point Outliers
Individual data points far from others
Contextual Outliers
Normal globally but abnormal in specific context
Collective Outliers
Groups of data points that together are anomalous
Handling Outliers
- Investigate: Understand why they exist
- Keep: If they represent valid extreme cases
- Remove: If they're errors or irrelevant
- Transform: Use robust methods or transformations
- Cap/Winsorize: Replace with less extreme values
Important: Outliers aren't always bad! They might represent:
- Fraud in financial transactions
- Rare diseases in medical data
- Equipment failures in sensor data
- Breakthrough performances in sports
Click on the scatter plot below to add data points. Watch as different outlier detection methods highlight anomalies in real-time, each using different criteria.
Interactive Visualization
Loading interactive visualization...
Interactive scatter plot where users click to add points and see various outlier detection methods highlight anomalies in real-time