Operations 25 min read

How Tencent Secures Game Operations: Real Cases, Challenges, and Data‑Driven Solutions

This article shares a comprehensive overview of game operation security at Tencent, covering personal background, real‑world incident cases, the inherent challenges of large‑scale game services, past monitoring efforts, and a new data‑driven alerting framework that dramatically reduces false alarms while protecting game economies.

Efficient Ops

Jan 16, 2018

How Tencent Secures Game Operations: Real Cases, Challenges, and Data‑Driven Solutions

1. Personal Introduction

I joined Tencent in 2008, initially working on DNF operations. As player concurrency grew, I built a Ruby‑based configuration management tool that generated server configuration and start/stop scripts, embodying early CMDB, automation, and auto‑generation concepts.

Over the years I have also been responsible for operations and management of multiple PC and mobile games such as DNF, Yulong, Feiche, and Huoying.

Currently I lead operation security in the Operations Department, covering application operation security, game economy security, and technical support for the audit team across all Tencent games.

2. Topic

Developing a successful game involves countless evaluations, data analyses, and optimizations. Once a game launches, the real challenge lies in the dynamic operation phase, where issues such as planning bugs, client‑side exploits, and internal mis‑operations can disrupt the game economy, cause public‑relations crises, or even force a re‑launch.

A healthy game economy requires stable producers, consumers, and well‑defined rules.

3. Operation Security Cases

3.1 Case 1

A shop allowed the purchase quantity to be altered to a tiny negative number, causing an integer underflow that let players mass‑extract any item, equipment, or gem.

Attackers can capture client/server packets, replay them, and repeatedly modify game data.

3.2 Case 2

A recharge‑rebate event mistakenly configured the number of gift packs to thousands instead of the intended 100, allowing players to buy a hundred‑fold of the original value for the same amount of money.

3.3 Case 3

Two backend Redis clusters failed failover; one recovered, the other did not, leading to partial data loss and allowing players to repeatedly claim once‑only recharge rewards.

3.4 Case 4

A well‑known New Year bug let players complete a dungeon without consuming tickets, repeatedly earning high‑value items and experience, resulting in millions of RMB worth of imbalance.

3.5 A Thoughtful Reflection

Community members have discussed testing, mis‑operation prevention, early warning, and rapid response mechanisms.

3.6 Small Goal

Undisclosed bugs remain hidden until exploited; attackers can monetize these points, posing the greatest threat to game economy security.

4. Challenges

4.1 Game Operation Security Challenges

Two main challenges: complexity and scale. When problems are few they are manageable, but at large scale many unknowns appear.

4.2 Lengthy Operation Process

Every release passes many stages; issues can arise from code bugs, testing gaps, or operational mistakes such as failed failover causing unintended diamond generation.

4.3 Dynamic Operation

Games must continuously adjust to player behavior and monetization demands, leading to frequent version updates (about 400 per year) and new bugs.

4.4 Human “Carelessness”

Both accidental and intentional mis‑operations by internal staff can create exploitable vulnerabilities.

4.5 Massive Business Scale

Our CMDB contains over 1,000 entries, supporting games with record‑breaking PCU such as League of Legends and Honor of Kings.

Access layer

Logic layer

Storage layer

Log platform

Big data platform

5. Past Efforts

Since 2010 we built basic guarantees and monitoring alerts: standardized logs, added trace IDs, and established a rapid‑response incident handling process to avoid PR crises.

However, fixed‑threshold alerts generated massive noise (over a thousand alerts per week per service) and lost trust from operators.

6. New Solution and Effects

6.1 New Idea

We treat the game as a society where most players follow normal statistical patterns. By continuously learning the dominant patterns (item flow per channel), we can detect outliers that indicate abuse.

6.2 Stage 1 – Monitoring Capability

We rebuilt the architecture, adding many monitoring models on top of a big‑data stack (Kafka, Spark, Elasticsearch, etc.) and a custom algorithm layer.

Typical workloads: 500 billion log entries per hour across 300 CPUs, each >80 % utilization.

Key models:

Frequency anomaly: detects high‑frequency repeated requests that may indicate client‑side hacks.

Trend anomaly: monitors per‑character maximum production values that shift with events and versions.

6.3 Stage 2 – Alert Analysis Capability

We built tools to triage alerts, filter false positives, and provide detailed user‑level context for operators.

6.4 Stage 3 – Fine‑Grained Operations

By continuously refining alert rules and adding white‑list mechanisms, weekly alert counts dropped from thousands to single digits, earning trust from product and operation teams.

Detection rate now exceeds 88 % and continues to improve as new models are added.

The solution requires only minutes to integrate a new game, regardless of revenue or genre, and has been deployed across dozens of titles.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

case study Monitoring Big Data Operations alerting Game Security

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.