Operations 16 min read

How AI-Driven Automation Transforms Tencent Game Operations

This article explains how Tencent Game operations moved from manual, threshold‑based monitoring to an AI‑powered, data‑driven workflow that automates scaling, improves online‑curve monitoring, enables full‑dimensional analysis, and reduces time, labor, and cost while enhancing player experience.

Efficient Ops
Efficient Ops
Efficient Ops
How AI-Driven Automation Transforms Tencent Game Operations

1. Improvements After Intelligent Intervention

Traditional Tencent Game operations relied on manual processes and simple automation built on the BlueKing platform, aiming to complete an entire incident task flow with a single click. After introducing intelligent automation, decisions such as demand initiation, scaling, and resource allocation are driven by machine‑learning models and real‑time data, turning the workflow from day‑level to minute‑level execution.

Key benefits include faster incident response, data‑assisted decision making, and the ability to predict PCU (peak concurrent users) two hours in advance, enabling proactive scaling.

2. Case Sharing

2.1 Online Curve Monitoring

Traditional monitoring used static thresholds or simple ratio comparisons, requiring manual adjustments. The intelligent solution leverages historical and anomalous data to train neural‑network models (Res‑DNN), eliminating the need for manual formula tuning and improving detection accuracy.

Feature engineering creates first‑order and second‑order difference sequences, which are fed into multi‑class classifiers to pinpoint abnormal points across time and business dimensions.

2.2 Full‑Dimension Monitoring

Game telemetry can generate up to 50 GB per hour across 13 dimensions (province, carrier, login channel, platform, etc.). The pipeline first cleans data, then applies variance‑coefficient analysis to detect abnormal component changes. Using the BlueKing data platform and Kafka, raw data is reduced to ~70 MB per 5‑minute window for real‑time analysis and alerting.

2.3 Latency Monitoring

Traditional latency monitoring relied on partitioned statistics and pie charts, which cannot handle high‑dimensional data. The intelligent system extracts 15 features and applies three algorithms—Logistic Regression, Random Forest, and Support Vector Machine—combined in a hierarchical classifier to generate precise latency alerts.

Precision‑Recall curves guide the selection of a model with >90% precision and ~60% recall, balancing false positives and missed incidents.

3. Summary

The purpose of intelligent automation is to raise quality and efficiency while cutting labor and cost. By integrating AI, data analytics, and automated scaling, traditional operational scenarios gain new vitality, handling more complex cases with less human intervention.

monitoringmachine learningAutomationoperationsgaming
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.