Artificial Intelligence 7 min read

Machine Learning‑Based Threshold‑Free Monitoring for Business Metrics

This article describes a monitoring system that leverages machine learning to perform threshold‑free, real‑time anomaly detection on macro business indicators such as network traffic and access volume, detailing its architecture, sample labeling, model training, and multi‑level alarm strategies.

58 Tech

Mar 25, 2019

Machine Learning‑Based Threshold‑Free Monitoring for Business Metrics

In the practice of monitoring business services, macro‑level business indicators (e.g., data‑center network traffic, business access volume) often reflect system health more accurately than low‑level metrics, but their daily periodicity and volatility make fixed‑threshold monitoring ineffective.

To address this, a machine‑learning‑driven solution was introduced to achieve threshold‑free monitoring of key business metrics, enabling personalized monitoring based on actual indicator behavior.

The system architecture consists of a data layer for historical data storage and offline model training, a core layer for real‑time data distribution, detection, and alert generation, and a presentation layer that visualizes metric curves and detection results, with an annotation feature for manual verification.

In the offline module, unlabeled samples are automatically labeled using a combination of statistical discrimination and unsupervised learning, creating a labeled sample pool for feature extraction and model training (using LightGBM). The online module loads the model to perform real‑time anomaly detection, and confirmed anomalies are fed back into the labeled pool to continuously improve the model.

Sample labeling relies on statistical and unsupervised methods to vote for high‑confidence normal and abnormal samples, forming the training dataset.

Model training handles class imbalance by down‑sampling normal samples and constructs contrast features (e.g., standard scores, year‑over‑year, week‑over‑week) based on historical windows, including special handling for holidays.

After training, the model labels all original unlabeled data, and its accuracy is validated through the system’s visualization tools, ensuring labeling precision above 90%.

Alert grading distinguishes between ordinary anomalies, severe anomalies, and steep‑change anomalies, applying different notification strategies (e.g., low‑priority email, medium‑priority SMS, high‑priority voice calls) based on the anomaly’s severity and persistence.

The solution demonstrates that machine‑learning‑based, threshold‑free monitoring can efficiently and accurately handle diverse business metrics, is adaptable to various scales and patterns, and is already deployed across numerous critical services, with future plans to further integrate AI techniques into operations.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

monitoring machine learning AI Operations Anomaly Detection business metrics threshold-free

Written by

58 Tech

Official tech channel of 58, a platform for tech innovation, sharing, and communication.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.