Artificial Intelligence 18 min read

Comprehensive Overview of Common Anomaly Detection Methods with Python Code Examples

This article compiles and explains various common anomaly detection techniques—including distribution‑based, distance‑based, density‑based, clustering, tree‑based, dimensionality‑reduction, classification, and prediction methods—providing theoretical descriptions, algorithmic steps, advantages, limitations, and Python code examples for each approach.

Python Programming Learning Circle

Jul 15, 2022

Comprehensive Overview of Common Anomaly Detection Methods with Python Code Examples

The article presents a comprehensive collection of common anomaly (outlier) detection methods, organized into several categories: distribution‑based (3‑sigma, Z‑score, boxplot, Grubbs test), distance‑based (K‑Nearest Neighbors), density‑based (Local Outlier Factor, Connectivity‑Based Outlier Factor), clustering‑based (DBSCAN), tree‑based (Isolation Forest), dimensionality‑reduction (PCA, AutoEncoder), classification‑based (One‑Class SVM), and prediction‑based approaches.

Distribution‑based methods

def three_sigma(s):
    mu, std = np.mean(s), np.std(s)
    lower, upper = mu - 3*std, mu + 3*std
    return lower, upper

def z_score(s):
    return (s - np.mean(s)) / np.std(s)

def boxplot(s):
    q1, q3 = s.quantile(0.25), s.quantile(0.75)
    iqr = q3 - q1
    lower, upper = q1 - 1.5*iqr, q3 + 1.5*iqr
    return lower, upper

Grubbs’ test is described with its hypothesis and iterative removal steps.

Distance‑based method (KNN)

from pyod.models.knn import KNN
clf = KNN(method='mean', n_neighbors=3)
clf.fit(X_train)
labels = clf.labels_  # 0: normal, 1: outlier

Density‑based methods

from sklearn.neighbors import LocalOutlierFactor as LOF
clf = LOF(n_neighbors=2)
labels = clf.fit_predict(X)

Connectivity‑Based Outlier Factor (COF) is introduced with its set‑based nearest path concept.

Clustering‑based method (DBSCAN)

DBSCAN treats points that cannot belong to any dense cluster as noise (outliers).

Tree‑based method (Isolation Forest)

from sklearn.ensemble import IsolationForest
iforest = IsolationForest(n_estimators=100, contamination=0.05, random_state=1)
iforest.fit(X)
labels = iforest.predict(X)  # -1: outlier, 1: normal

The algorithm isolates anomalies with fewer splits, yielding higher anomaly scores.

Dimensionality‑reduction methods

from sklearn.decomposition import PCA
pca = PCA()
pca.fit(data)
transformed = pca.transform(data)

PCA can detect outliers by large deviations in principal component space or by high reconstruction error. AutoEncoder, a non‑linear counterpart, is trained on normal data and flags samples with large reconstruction loss.

Classification‑based method (One‑Class SVM)

from sklearn import svm
clf = svm.OneClassSVM(nu=0.1, kernel='rbf', gamma=0.1)
clf.fit(X)
pred = clf.predict(X)  # -1: outlier, 1: normal

One‑Class SVM (or SVDD) learns a boundary that encloses the majority of data, treating points outside as anomalies.

Prediction‑based method

For time‑series, a forecasting model predicts future values; residuals are then analyzed (e.g., using K‑sigma) to identify abnormal points.

The article also discusses the strengths and weaknesses of each technique, providing guidance on when to choose a particular method based on data dimensionality, distribution assumptions, and computational cost.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python Anomaly Detection statistical methods outlier detection

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.