Operations 15 min read

How 360 Detects Network Anomalies with AI‑Powered Time‑Series Algorithms

This article explains how 360’s network operations team uses time‑series analysis, statistical thresholds, EWMA, dynamic limits, and machine‑learning models such as K‑Means and Isolation Forest to automatically detect, locate, and remediate traffic anomalies across massive data‑center exits.

Efficient Ops

Apr 26, 2018

How 360 Detects Network Anomalies with AI‑Powered Time‑Series Algorithms

Foreword

Thanks to the Efficient Operations community for the platform. I was a network engineer at 360 and experienced the company’s architectural transformation, shifting my focus to network monitoring, automation, visualization, and AI applications. The presentation is divided into four parts: project background, time‑series algorithms, machine learning, and future outlook.

1. Project Background

The project targets ISP‑level traffic anomalies at DC exits, aiming to automatically discover, locate, and notify the responsible business when abnormal traffic occurs.

360’s services span search, smart hardware, mobile, dashcams, children’s watches, robots, and cloud, supporting 8.65 billion monthly active users across 122 data‑center sites with 3.5 Tbps ISP bandwidth.

Zero tolerance for service interruption requires real‑time insight into DC‑exit traffic, detection of any abnormal patterns, and immediate response.

Challenges include identifying which business caused an anomaly when alerts appear as a black box, and avoiding reliance on fixed‑threshold monitoring that misses subtle deviations.

We stored traffic from hundreds of thousands of ports as time‑series data, extracting dozens of features (server, domain, business owner, region, etc.) to enable downstream anomaly detection.

2. Time‑Series Algorithms

After data collection we verify stationarity and apply differencing when needed.

2.1 3‑Sigma

Assuming normal distribution, data points beyond three standard deviations from the mean are flagged as anomalies.

2.2 EWMA (Exponentially Weighted Moving Average)

EWMA weights recent data more heavily (parameter λ between 0 and 1). We compute a 15‑minute window EWMA over seven days, using the latest EWMA value as the mean for 3‑sigma comparison, which captures recent trends while smoothing noise.

2.3 Dynamic Threshold

Using the second‑smallest and second‑largest values from the past 14 days, multiplied by 0.6 and 1.2 respectively, we create adaptive upper and lower bounds that adjust to traffic patterns.

2.4 Small‑Flow Monitoring Optimization

We apply a logarithmic function to give higher sensitivity to low‑volume flows, allowing a steep curve for small traffic and a gentler slope for larger volumes.

3. Machine Learning

When statistical methods struggle, we turn to machine learning.

3.1 Architecture

An automatically updating model handles evolving traffic patterns; offline training produces a model, while online inference classifies real‑time traffic as normal or abnormal.

3.2 Supervised vs Unsupervised Learning

Supervised learning requires balanced, manually labeled samples; unsupervised learning discovers patterns without labels but needs parameter tuning.

3.3 Feature Extraction

Features include normalized traffic volume, quantile‑based distributions, period‑over‑period ratios, and coefficient of variation; these are fed to the model for classification.

3.4 Model Selection

We evaluated K‑Means clustering (with distance‑based anomaly thresholds) and Isolation Forest (tree‑based anomaly scoring). Isolation Forest proved easier to use and performed better with limited features, so we adopted it.

Each port‑direction pair receives its own model, updated daily, with a 10‑minute sliding window for inference.

Ensembling multiple algorithms (four statistical methods plus the ML model) and voting improves detection accuracy to over 98%.

4. Present and Future

Detection alone is insufficient without business‑level attribution. We built a C‑based tool to map abnormal traffic to IP, protocol, and business owner, sending automated alerts.

Correlation analysis using Pearson coefficients helps identify related traffic curves; a similarity search (“千里眼”) automatically finds the most correlated DC ports for a given anomaly.

Future work focuses on linking diverse monitoring metrics, suppressing redundant alerts, pinpointing root causes, and automating remediation through pre‑defined playbooks.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Machine Learning Anomaly Detection Time-series Network Monitoring AI Ops

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.