Databases 19 min read

Building and Optimizing a Large‑Scale Graph Platform for Financial Risk Control at Du Xiaoman Financial

This article describes how Du Xiaoman Financial designed, built, and continuously optimized a massive graph platform—including data governance, graph learning, query performance, data import, and online deployment—to improve credit risk assessment using billions of nodes and edges, and shares practical lessons on graph databases, distributed training, and real‑time inference.

DataFunTalk
DataFunTalk
DataFunTalk
Building and Optimizing a Large‑Scale Graph Platform for Financial Risk Control at Du Xiaoman Financial

Du Xiaoman Financial’s risk‑control team created a large‑scale graph platform to enrich credit‑risk models with relational information, constructing a heterogeneous graph of over 30 billion user nodes and 100 billion edges.

The platform provides three core capabilities: graph learning (including graph representation and graph neural networks), graph mining (e.g., emergency‑contact linkage), and graph analysis (e.g., fraud‑ring detection).

To support these capabilities, the team wrapped a graph database with open APIs for one‑hop and two‑hop neighbor queries, built ETL pipelines for data import/export, and integrated a graph‑computing framework for visualization and analytics.

After evaluating Neo4j and JanusGraph, JanusGraph was chosen for its distributed storage, scalability to trillions of edges, and better high‑availability options.

Query performance was improved by parallelizing neighbor lookups with a process pool and by embedding lightweight neighbor information directly in edges, reducing I/O from many calls to a single call for two‑hop queries.

Data‑import speed was boosted by replacing HBase‑based ID generation with local hash generation, allowing dangling edges, and switching from three‑replica to two‑replica storage, achieving over 200 k writes per second.

For graph model training, the team moved from Euler to DGL because DGL supports distributed large‑scale learning; they optimized DGL’s preprocessing and sampling stages with memory‑efficient storage, multi‑process parallelism, and caching, cutting training time by 85 %.

Online deployment of the risk‑graph model (a two‑layer GraphSAGE + GAT architecture) required consistency between offline and online scoring, so the same sub‑graph is used for both training and inference, and validation steps ensure score parity.

To meet latency requirements for two‑hop neighbor aggregation, the platform stores pre‑aggregated representations at the penultimate layer, enabling one‑hop lookups (<10 ms) while keeping storage growth manageable.

Dynamic graph updates are handled by inductive learning: new nodes are scored using first‑ and second‑order neighbor information, and edge additions trigger only first‑order aggregation, preserving real‑time accuracy.

The article concludes with reflections on the challenges of graph data scale, the importance of dimensionality reduction, parallelism, and caching, and outlines future goals of reaching trillion‑scale graphs and lowering the barrier to graph‑learning applications.

graph databaserisk controlLarge ScalegraphJanusGraphDGLfinancial analytics
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.