Artificial Intelligence 7 min read

Machine Learning‑Based Threshold‑Free Monitoring for Business Metrics

This article describes a monitoring system that leverages machine learning to perform threshold‑free, real‑time anomaly detection on macro business indicators such as network traffic and access volume, detailing its architecture, sample labeling, model training, and multi‑level alarm strategies.

58 Tech
58 Tech
58 Tech
Machine Learning‑Based Threshold‑Free Monitoring for Business Metrics

In the practice of monitoring business services, macro‑level business indicators (e.g., data‑center network traffic, business access volume) often reflect system health more accurately than low‑level metrics, but their daily periodicity and volatility make fixed‑threshold monitoring ineffective.

To address this, a machine‑learning‑driven solution was introduced to achieve threshold‑free monitoring of key business metrics, enabling personalized monitoring based on actual indicator behavior.

The system architecture consists of a data layer for historical data storage and offline model training, a core layer for real‑time data distribution, detection, and alert generation, and a presentation layer that visualizes metric curves and detection results, with an annotation feature for manual verification.

In the offline module, unlabeled samples are automatically labeled using a combination of statistical discrimination and unsupervised learning, creating a labeled sample pool for feature extraction and model training (using LightGBM). The online module loads the model to perform real‑time anomaly detection, and confirmed anomalies are fed back into the labeled pool to continuously improve the model.

Sample labeling relies on statistical and unsupervised methods to vote for high‑confidence normal and abnormal samples, forming the training dataset.

Model training handles class imbalance by down‑sampling normal samples and constructs contrast features (e.g., standard scores, year‑over‑year, week‑over‑week) based on historical windows, including special handling for holidays.

After training, the model labels all original unlabeled data, and its accuracy is validated through the system’s visualization tools, ensuring labeling precision above 90%.

Alert grading distinguishes between ordinary anomalies, severe anomalies, and steep‑change anomalies, applying different notification strategies (e.g., low‑priority email, medium‑priority SMS, high‑priority voice calls) based on the anomaly’s severity and persistence.

The solution demonstrates that machine‑learning‑based, threshold‑free monitoring can efficiently and accurately handle diverse business metrics, is adaptable to various scales and patterns, and is already deployed across numerous critical services, with future plans to further integrate AI techniques into operations.

monitoringmachine learningAIoperationsAnomaly Detectionbusiness metricsthreshold-free
58 Tech
Written by

58 Tech

Official tech channel of 58, a platform for tech innovation, sharing, and communication.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.