Operations 11 min read

Scaling Game Server Ops: Managing 10,000+ Cloud Instances Efficiently

This article details YOOZOO Network's evolution from physical to virtualized and clustered game server architectures, the automation of operations across three generations, the design of the UJOBS job platform, robust database backup strategies, and a step‑by‑step migration of thousands of servers to Alibaba Cloud.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Scaling Game Server Ops: Managing 10,000+ Cloud Instances Efficiently

Game Product Architecture Evolution

Over nearly seven years, YOOZOO's game servers grew from 100 to over 10,000 machines, passing through three architectural stages.

First generation: DB + compute + front‑end roles on physical servers, 2‑4 machines per game zone, low efficiency.

Second generation: Virtualization using OpenStack, all‑in‑one virtual machines, higher efficiency but limited high‑availability.

Third generation: Service‑cluster model combining physical and virtual resources, elastic scaling, seconds‑level server launch, and strong high‑availability.

Shift in Operations

First generation: Manual operations, logging into each server individually.

Second generation: Automated batch scripts (e.g., expect‑based auto scripts) executed from a central control server.

Third generation: Systematic operations with coordinated tools such as CMDB, business tree, and job platform, enabling near‑instant zone creation via web UI.

YOOZOO Job Platform UJOBS

The C/S‑based UJOBS platform consists of a task scheduler and agents, providing a command channel for servers, logging outputs, and offering a web interface for real‑time monitoring.

Version change in repository triggers a build.

Changed files are pulled from the repository.

Built artifacts are pushed to a distributed version‑control server cluster.

UJOBS agents receive update instructions, server addresses, change list, and version info.

Agents pull update files and execute predefined update scripts.

During execution, logs are viewable in real time, and two rollback methods are supported: preserving historical versions on servers or overwriting with the previous version.

Database Backup and Recovery

For thousands of MySQL instances, YOOZOO uses XtraBackup to back up data files directly on the primary server, storing backups locally before replicating them to a remote server. Backups run hourly, with half‑hour local and half‑hour remote copies. This method may cause up to one hour of data loss under worst‑case I/O constraints.

Recovery is performed via a one‑click tool: provide IP, time range, and database name for local recovery (using binary logs) within 24 hours, or remote recovery for older data.

Cloud Migration Journey

YOOZOO migrated several legacy games to Alibaba Cloud with minimal downtime, following these steps:

Prepare resources in Alibaba Cloud, establish VPC connectivity with the on‑premise network.

Synchronize data using XtraBackup‑based MySQL master‑slave replication to ECS instances.

Perform the actual migration during a normal maintenance window (0.5–2 hours), moving each product in 1–2 hours after 3–5 days of preparation; over 1,000 ECS instances are now in use.

Post‑migration, game logic runs on ECS, traffic is served via VPC and ULB, and future plans include deploying clustered mode on Alibaba Cloud with RDS for data storage, SLB for load balancing, and integration of LOG and MaxCompute for big‑data processing.

Choosing the Right Database

Instance count: many low‑traffic instances favor self‑hosted MySQL; fewer high‑load instances favor RDS.

Data size: large datasets with complex backup needs recommend RDS; smaller data can remain self‑hosted.

Cost considerations: RDS is pricier but offers enhanced security and stability, which may outweigh cost for critical workloads.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

MigrationAutomationDatabase Backupcloud operationsgame-servers
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.