Artificial Intelligence 13 min read

AI‑Powered Anomaly Detection Algorithms for Observability Metrics

The article explains how AI‑powered anomaly detection—using statistical 3‑sigma/Z-score methods, unsupervised machine‑learning like Isolation Forest, and deep‑learning models such as LSTM, Transformer and Pyraformer—overcomes the limits of threshold‑based monitoring by preprocessing data, reducing false alerts, and delivering high‑precision observability metrics.

DeWu Technology
DeWu Technology
DeWu Technology
AI‑Powered Anomaly Detection Algorithms for Observability Metrics

In reliability engineering, fault management relies on four core functions: discovery, reachability, localization, and recovery. Fault discovery, the first step, includes metric prediction, anomaly detection, and fault prediction. This article introduces AI‑based anomaly detection algorithms applied to metric monitoring.

Traditional threshold‑based methods suffer from heavy reliance on expert experience, complex configuration, frequent adjustments due to business changes, and sensitivity to outliers caused by promotions or abnormal values.

AI detection algorithms overcome these issues, especially for sudden spikes or drops. Before applying AI models, historical data must be pre‑processed: outlier removal (using box‑plot) and missing‑value imputation (forward/backward fill, mean/median, linear or polynomial interpolation).

Statistical methods include the 3‑sigma rule and Z‑score change‑point detection. The 3‑sigma algorithm assumes normal distribution and flags points beyond three standard deviations as anomalies. Z‑score uses a sliding window, smooths data, computes differences, and evaluates the right‑tail probability of the normal survival function; a probability below 0.01 indicates an anomaly.

Machine‑learning methods focus on unsupervised techniques due to the lack of labeled data. The Isolation Forest algorithm isolates points by random partitioning; fewer splits indicate higher isolation (anomaly). Key parameters such as n_estimators , max_samples , and contamination are tuned (e.g., contamination = 0.01) to control detection strictness.

Deep‑learning methods explore LSTM, Transformer, and the pyramid‑based pyraformer . Pyraformer predicts metric values; by comparing predictions with observations, an error series is generated. The error series’ standard deviation provides a dynamic sigma, allowing prediction intervals (prediction ± 3·sigma) for anomaly bounds.

Experimental results show that statistical methods are fast and interpretable but may produce many false positives for low‑variance metrics (e.g., memory usage). Isolation Forest reduces false alarms in such cases, while deep‑learning models further improve accuracy for critical scenarios, achieving over 50% reduction in false alerts across 50+ core services.

In practice, a hybrid strategy combines vertical (historical same‑time‑point) 3‑sigma detection with horizontal (cross‑time‑window) Z‑score detection, and supplements them with Isolation Forest and deep‑learning predictions to achieve high‑precision anomaly alerts.

machine learningAIdeep learningObservabilitystatisticsAnomaly Detection
DeWu Technology
Written by

DeWu Technology

A platform for sharing and discussing tech knowledge, guiding you toward the cloud of technology.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.