Operations 12 min read

Intelligent Anomaly Detection for Ctrip Operations: LSTM Forecasting, Trend Analysis, Adaptive Thresholds, and Periodic Anomaly Filtering

The article describes Ctrip's AIOps approach to improving alert quality by combining statistical methods and machine‑learning models such as LSTM, trend analysis, adaptive threshold calculation, and dynamic‑time‑warping based periodic anomaly detection, achieving significant gains in precision and fault‑recall rates.

Ctrip Technology

Aug 3, 2023

Intelligent Anomaly Detection for Ctrip Operations: LSTM Forecasting, Trend Analysis, Adaptive Thresholds, and Periodic Anomaly Filtering

Background Ctrip, a large online travel platform, faces stability challenges due to traffic spikes, code releases, and operational changes. To meet a "1‑5‑10" fault‑handling goal (detect in 1 min, locate in 5 min, resolve in 10 min), a robust, low‑cost, high‑accuracy anomaly detection system is needed for key metrics like order volume.

2.1 More Accurate Prediction Time‑series anomaly detection predicts metric values and flags deviations. Various models (ARIMA, Holt‑Winter, LSTM) were evaluated; LSTM performed best on Ctrip's strongly periodic order data. A sliding‑window of the latest 10 points feeds the LSTM. To avoid drift when metrics slowly decline, a hypothesis test (Mann‑Whitney U) checks for short‑term trend; if a trend is detected, the previous window is retained, improving MAE.

Table 1 – Model Prediction Errors (MAE) shows LSTM‑Adjust achieving the lowest error across three business lines (AA, BB, CC).

2.2 Adaptive Threshold Calculation Manual rule‑based thresholds are overly sensitive and costly. Instead, the system computes thresholds adaptively from the metric’s own volatility. A statistic Z = (actual − prediction)/σ is defined; Z follows a stationary time series. Non‑parametric kernel density estimation (KDE) fits Z’s distribution, and the 99.99th percentile serves as the anomaly cutoff. Separate high‑ and low‑volatility periods (derived from coefficient of variation) receive distinct thresholds, reducing false alarms during low‑traffic periods.

2.3 Business Trend Analysis A single anomaly detector is insufficient for many metrics. Linear regression (Huber‑Regression) models short‑term trends; residual distance measures volatility. Combining this with the LSTM‑based predictor filters out metric jitter, boosting alert precision by ~30%.

2.4 Periodic Anomaly Detection Periodic anomalies—regular but unexpected spikes—are filtered using Dynamic Time Warping (DTW) to align current and historical windows, extracting features (period, amplitude, phase) and classifying via a supervised model. This reduces periodic false alarms by ~80%.

Conclusion The intelligent anomaly detection system consists of offline training (using 14 days of pre‑processed data) and online real‑time detection (baseline prediction, unsupervised methods such as Boxplot, K‑sigma, KDE). Over three years of deployment, alert accuracy and recall have improved markedly, with most faults discovered within one minute.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Monitoring Anomaly Detection aiops Time-series LSTM adaptive threshold

Written by

Ctrip Technology

Official Ctrip Technology account, sharing and discussing growth.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.