Applying Graph Neural Networks for Financial Risk Control: A Case Study by Shuhe Technology
This article details how Shuhe Technology leveraged large‑scale graph neural networks, built with DGL and PyTorch, to improve financial fraud detection by preparing massive relationship graphs, pruning sparse nodes, extracting rich features, addressing class imbalance, and achieving a stable AUC gain of about four points.
Business Background – Knowledge graphs have long been used in finance, but traditional manual feature extraction suffers from high cost and low efficiency. Shuhe Technology partnered with a fraud‑detection team to explore Graph Neural Networks (GNN) for deeper relationship mining.
The internal graph contains over 70 billion edges and more than 10 billion nodes, with only a tiny fraction of labeled, feature‑rich nodes, making analysis costly and sparse.
Traditional one‑hop manual analysis cannot capture two‑hop relationships, which often reveal higher risk; GNN can automatically aggregate multi‑hop information.
Data Preparation – Four open‑source GNN frameworks were evaluated (DGL, PyG, PGL, AliGraph); DGL was chosen for its industrial support and activity. A time‑window sample (Sept–Dec 2020) yielded ~7 billion nodes and 20 billion edges. Sparse, feature‑less nodes (<1 % initially) were pruned, raising feature‑rich node coverage to >5 %.
Over 80 node features (user attributes, loan history, repayment records, etc.) were attached to graph nodes. Sample imbalance (positive rate <0.5 %) was handled by weighting the loss instead of resampling to preserve graph structure.
Model Introduction – The architecture combines GraphSAGE for multi‑layer neighbor sampling (2–3 layers to avoid over‑smoothing) with Graph Attention Networks (GAT) for aggregation, followed by a feed‑forward neural network for final prediction. Sampling uses DGL’s MultiLayerNeighborSampler , and attention employs multi‑head mechanisms.
Project Summary – On two out‑of‑time test sets, the GNN model achieved stable risk discrimination, and stacking it with a traditional model increased AUC by roughly four points. Deployment challenges include online graph sampling and engineering integration. Future work aims to enrich the graph with device and location data, explore heterogeneous graph models (e.g., R‑GCN, HARP, SEAL), and expand applications to marketing, recommendation, and lost‑user recovery.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.