Artificial Intelligence 18 min read

Comprehensive Overview of Common Anomaly Detection Methods with Python Code Examples

This article compiles and explains various common anomaly detection techniques—including distribution‑based, distance‑based, density‑based, clustering, tree‑based, dimensionality‑reduction, classification, and prediction methods—providing theoretical descriptions, algorithmic steps, advantages, limitations, and Python code examples for each approach.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Comprehensive Overview of Common Anomaly Detection Methods with Python Code Examples

The article presents a comprehensive collection of common anomaly (outlier) detection methods, organized into several categories: distribution‑based (3‑sigma, Z‑score, boxplot, Grubbs test), distance‑based (K‑Nearest Neighbors), density‑based (Local Outlier Factor, Connectivity‑Based Outlier Factor), clustering‑based (DBSCAN), tree‑based (Isolation Forest), dimensionality‑reduction (PCA, AutoEncoder), classification‑based (One‑Class SVM), and prediction‑based approaches.

Distribution‑based methods

def three_sigma(s):
    mu, std = np.mean(s), np.std(s)
    lower, upper = mu - 3*std, mu + 3*std
    return lower, upper
def z_score(s):
    return (s - np.mean(s)) / np.std(s)
def boxplot(s):
    q1, q3 = s.quantile(0.25), s.quantile(0.75)
    iqr = q3 - q1
    lower, upper = q1 - 1.5*iqr, q3 + 1.5*iqr
    return lower, upper

Grubbs’ test is described with its hypothesis and iterative removal steps.

Distance‑based method (KNN)

from pyod.models.knn import KNN
clf = KNN(method='mean', n_neighbors=3)
clf.fit(X_train)
labels = clf.labels_  # 0: normal, 1: outlier

Density‑based methods

from sklearn.neighbors import LocalOutlierFactor as LOF
clf = LOF(n_neighbors=2)
labels = clf.fit_predict(X)

Connectivity‑Based Outlier Factor (COF) is introduced with its set‑based nearest path concept.

Clustering‑based method (DBSCAN)

DBSCAN treats points that cannot belong to any dense cluster as noise (outliers).

Tree‑based method (Isolation Forest)

from sklearn.ensemble import IsolationForest
iforest = IsolationForest(n_estimators=100, contamination=0.05, random_state=1)
iforest.fit(X)
labels = iforest.predict(X)  # -1: outlier, 1: normal

The algorithm isolates anomalies with fewer splits, yielding higher anomaly scores.

Dimensionality‑reduction methods

from sklearn.decomposition import PCA
pca = PCA()
pca.fit(data)
transformed = pca.transform(data)

PCA can detect outliers by large deviations in principal component space or by high reconstruction error. AutoEncoder, a non‑linear counterpart, is trained on normal data and flags samples with large reconstruction loss.

Classification‑based method (One‑Class SVM)

from sklearn import svm
clf = svm.OneClassSVM(nu=0.1, kernel='rbf', gamma=0.1)
clf.fit(X)
pred = clf.predict(X)  # -1: outlier, 1: normal

One‑Class SVM (or SVDD) learns a boundary that encloses the majority of data, treating points outside as anomalies.

Prediction‑based method

For time‑series, a forecasting model predicts future values; residuals are then analyzed (e.g., using K‑sigma) to identify abnormal points.

The article also discusses the strengths and weaknesses of each technique, providing guidance on when to choose a particular method based on data dimensionality, distribution assumptions, and computational cost.

machine learningPythonanomaly detectionstatistical methodsoutlier detection
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.