Artificial Intelligence 7 min read

Distributed Network Embedding Algorithm for Billion‑Scale Graph Data in Tencent Games

Tencent’s Game Social Algorithm Team presents a Spark‑based distributed network embedding framework that recursively partitions hundred‑billion‑edge game graphs into manageable subgraphs, runs node2vec locally, and fuses results, enabling efficient link prediction and node classification across multiple games within hours.

Tencent Cloud Developer
Tencent Cloud Developer
Tencent Cloud Developer
Distributed Network Embedding Algorithm for Billion‑Scale Graph Data in Tencent Games

The article introduces a distributed network embedding algorithm developed by Tencent Game Social Algorithm Team that can process graph data at the hundred‑billion edge scale. It explains the background of representing various game‑related entities (friends, interactions, items) as graph nodes and edges, and outlines two typical tasks: link prediction (e.g., friend recommendation) and node classification (e.g., churn or payment prediction).

Traditional network embedding methods (random‑walk‑based and matrix‑factorization‑based) become computationally prohibitive on such massive graphs because the whole graph cannot fit into a single machine’s memory and the algorithms require intensive computation. To overcome these challenges, the authors propose a recursive graph‑partitioning approach that splits a large graph into multiple induced subgraphs and border subgraphs. The partitioning is performed repeatedly until each subgraph has a comparable size.

After partitioning, the embedding computation is carried out independently on each subgraph using an existing algorithm (node2vec in the presented implementation). The embeddings from all subgraphs are then fused to obtain a global representation that approximates the original optimization objective.

The complete pipeline consists of three stages: (1) recursive graph partitioning, (2) per‑subgraph embedding with node2vec, and (3) fusion of subgraph embeddings. The algorithm is implemented on Apache Spark to exploit distributed resources.

Extensive experiments on more than five games (including titles from different genres) demonstrate the practicality of the approach. Table 1 shows the runtime (in hours) for each game’s social network, ranging from 5 h to 23 h despite edge counts exceeding hundreds of billions. Figure 6 illustrates the relative performance gains of the distributed embedding over baseline methods across various business scenarios such as friend recommendation and item recommendation.

The team also lists related publications, including a KDD 2021 paper on large‑scale network embedding in Spark and a WSDM 2020 paper on graph‑partition‑based initialization for embedding.

Big Datadistributed computingSparkgraph embeddinggame analyticsnetwork representationnode2vec
Tencent Cloud Developer
Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.