How AI-Driven Automation Transforms Tencent Game Operations
This article explains how Tencent Game operations moved from manual, threshold‑based monitoring to an AI‑powered, data‑driven workflow that automates scaling, improves online‑curve monitoring, enables full‑dimensional analysis, and reduces time, labor, and cost while enhancing player experience.
1. Improvements After Intelligent Intervention
Traditional Tencent Game operations relied on manual processes and simple automation built on the BlueKing platform, aiming to complete an entire incident task flow with a single click. After introducing intelligent automation, decisions such as demand initiation, scaling, and resource allocation are driven by machine‑learning models and real‑time data, turning the workflow from day‑level to minute‑level execution.
Key benefits include faster incident response, data‑assisted decision making, and the ability to predict PCU (peak concurrent users) two hours in advance, enabling proactive scaling.
2. Case Sharing
2.1 Online Curve Monitoring
Traditional monitoring used static thresholds or simple ratio comparisons, requiring manual adjustments. The intelligent solution leverages historical and anomalous data to train neural‑network models (Res‑DNN), eliminating the need for manual formula tuning and improving detection accuracy.
Feature engineering creates first‑order and second‑order difference sequences, which are fed into multi‑class classifiers to pinpoint abnormal points across time and business dimensions.
2.2 Full‑Dimension Monitoring
Game telemetry can generate up to 50 GB per hour across 13 dimensions (province, carrier, login channel, platform, etc.). The pipeline first cleans data, then applies variance‑coefficient analysis to detect abnormal component changes. Using the BlueKing data platform and Kafka, raw data is reduced to ~70 MB per 5‑minute window for real‑time analysis and alerting.
2.3 Latency Monitoring
Traditional latency monitoring relied on partitioned statistics and pie charts, which cannot handle high‑dimensional data. The intelligent system extracts 15 features and applies three algorithms—Logistic Regression, Random Forest, and Support Vector Machine—combined in a hierarchical classifier to generate precise latency alerts.
Precision‑Recall curves guide the selection of a model with >90% precision and ~60% recall, balancing false positives and missed incidents.
3. Summary
The purpose of intelligent automation is to raise quality and efficiency while cutting labor and cost. By integrating AI, data analytics, and automated scaling, traditional operational scenarios gain new vitality, handling more complex cases with less human intervention.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.