Big Data 15 min read

Angel Graph: A Scalable Graph Computing Platform – Architecture, Optimizations, and Applications

The article introduces Angel Graph, a large‑scale graph computing platform built on Angel's parameter‑server architecture and Spark, detailing its evolution, framework components (including Spark‑on‑Angel and PyTorch‑on‑Angel), data and model partitioning strategies, communication and computation optimizations, stability mechanisms, usability features, and real‑world applications across recommendation, risk control, social and gaming domains.

DataFunSummit
DataFunSummit
DataFunSummit
Angel Graph: A Scalable Graph Computing Platform – Architecture, Optimizations, and Applications

Speaker Xu Jie, senior engineer at Tencent, presents Angel Graph, a large‑scale graph computing platform that evolved from Angel's 1.0 machine‑learning version (2017) to a mature graph platform by version 3.2, integrating high‑performance parameter‑server (PS) capabilities with the Spark ecosystem.

Framework Overview

Angel PS serves as a high‑performance parameter server for storing and accessing graph models.

Spark provides a robust ETL and distributed computation layer.

Spark on Angel

Combining Angel PS and Spark enables efficient handling of high‑dimensional sparse graph data. Angel PS manages frequent updates and random accesses (e.g., neighbor lists), while Spark reads raw edges from HDFS, forms edge RDDs, and performs graph operations such as neighbor intersection. Custom PS functions allow computation (e.g., neighbor sampling) to be offloaded to the PS, reducing communication volume.

PyTorch on Angel

To support graph neural networks, PyTorch is integrated as a runtime on top of Angel PS. The PS stores model parameters, while PyTorch handles forward and backward passes, allowing users to define their own PyTorch models without dealing with the underlying distributed infrastructure.

Data and Model Partitioning

Data partitioning supports edge, vertex, mixed, and block cuts, letting users choose the scheme that matches their data distribution.

Model partitioning offers Range and Hash strategies; Range yields low memory usage for contiguous IDs, while Hash balances load for non‑contiguous IDs.

Stability

Angel PS provides checkpointing for both system‑level and algorithm‑level fault tolerance. Global checkpoints ensure correctness for graph mining algorithms, whereas per‑PS checkpoints suffice for graph representation learning, and a hybrid approach is recommended for graph neural networks.

Usability

The platform bundles a rich library of graph algorithms (mining, representation learning, GNNs) and abstracts common graph operators, making it easy for both out‑of‑the‑box use and custom development.

Communication Optimizations

Reduce the number of RPC links by partitioning data according to PS partitions (Square Partitioner) and using kernel merging to limit PS contacts per data partition.

Leverage custom PS functions to perform neighbor sampling on the server side, shrinking data transfer from edge‑level to vertex‑level size.

Computation Optimizations

Scatter computation of super‑vertices (high‑degree nodes) across workers to avoid load imbalance.

Dynamic adjacency‑list compression avoids costly shuffle operations when building adjacency lists for trillion‑edge graphs, pushing raw edges directly to PS and letting the PS compress and store them.

Applications

Angel Graph is deployed internally at Tencent for recommendation, risk control, social networking, and gaming, supporting homogeneous/heterogeneous, directed/undirected graphs, with or without attributes, and both supervised and unsupervised tasks.

Thank you for listening.

distributed systemsoptimizationBig DataPyTorchSparkparameter servergraph computing
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.