Artificial Intelligence 7 min read

Distributed Network Embedding Algorithm for Billion‑Scale Graph Data in Tencent Games

Tencent’s Game Social Algorithm Team presents a Spark‑based distributed network embedding framework that recursively partitions hundred‑billion‑edge game graphs into manageable subgraphs, runs node2vec locally, and fuses results, enabling efficient link prediction and node classification across multiple games within hours.

Tencent Cloud Developer

May 29, 2024

Distributed Network Embedding Algorithm for Billion‑Scale Graph Data in Tencent Games

The article introduces a distributed network embedding algorithm developed by Tencent Game Social Algorithm Team that can process graph data at the hundred‑billion edge scale. It explains the background of representing various game‑related entities (friends, interactions, items) as graph nodes and edges, and outlines two typical tasks: link prediction (e.g., friend recommendation) and node classification (e.g., churn or payment prediction).

Traditional network embedding methods (random‑walk‑based and matrix‑factorization‑based) become computationally prohibitive on such massive graphs because the whole graph cannot fit into a single machine’s memory and the algorithms require intensive computation. To overcome these challenges, the authors propose a recursive graph‑partitioning approach that splits a large graph into multiple induced subgraphs and border subgraphs. The partitioning is performed repeatedly until each subgraph has a comparable size.

After partitioning, the embedding computation is carried out independently on each subgraph using an existing algorithm (node2vec in the presented implementation). The embeddings from all subgraphs are then fused to obtain a global representation that approximates the original optimization objective.

The complete pipeline consists of three stages: (1) recursive graph partitioning, (2) per‑subgraph embedding with node2vec, and (3) fusion of subgraph embeddings. The algorithm is implemented on Apache Spark to exploit distributed resources.

Extensive experiments on more than five games (including titles from different genres) demonstrate the practicality of the approach. Table 1 shows the runtime (in hours) for each game’s social network, ranging from 5 h to 23 h despite edge counts exceeding hundreds of billions. Figure 6 illustrates the relative performance gains of the distributed embedding over baseline methods across various business scenarios such as friend recommendation and item recommendation.

The team also lists related publications, including a KDD 2021 paper on large‑scale network embedding in Spark and a WSDM 2020 paper on graph‑partition‑based initialization for embedding.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Distributed computing Spark graph embedding Game Analytics network representation node2vec

Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.