Information Security 14 min read

Association Graph for Fraud Detection: Theory, Architecture, and Applications

This article explains the concept of association graphs, their foundation in graph theory, storage architectures, noise‑reduction techniques, and practical applications such as feature mining, coloring, backend visualization, data analysis, and monitoring for fraud detection in risk control systems.

Zhuanzhuan Tech

Nov 15, 2023

Association Graph for Fraud Detection: Theory, Architecture, and Applications

Background

Most fraudulent activities on the platform are carried out by a small group of malicious actors (black market) who reuse resources to commit fraud, coupon abuse, money laundering, etc. Linking these actors through association graphs enables the detection of organized groups rather than isolated users, greatly enhancing risk control.

Association Graph Introduction

Definition : An association graph represents users, devices, phone numbers, and other entities as vertices and the shared attributes (e.g., the same phone number) as edges, forming a graph that captures relational information beyond single‑node attributes.

Relation to Graph Theory : The graph’s theoretical basis is graph theory, which studies vertices and edges to model relationships. Both undirected and directed graphs, homogeneous and heterogeneous graphs, adjacency matrices/lists, paths, and connectivity concepts are relevant.

Graph Basics

Vertices represent entities such as users A, B, C. Edges represent association factors (e.g., shared phone numbers). Undirected graphs have edges without direction, while directed graphs have ordered edges. Heterogeneous graphs contain multiple vertex/edge types (users, phones, devices), whereas homogeneous graphs contain a single type.

Adjacency matrices store vertex information in a one‑dimensional array and edge information in a two‑dimensional array; adjacency lists store vertices in an array and edges in linked lists. Paths are sequences of edges between vertices; the shortest path is the minimal‑length path, solvable by BFS, DFS, Dijkstra, Floyd, etc. Connected graphs have a path between any two vertices; maximal connected subgraphs are called connected components.

Association Graph Architecture

Association Scheme : Comparing homogeneous and heterogeneous graphs shows that homogeneous graphs have lower storage cost, smaller size, and lower computational complexity, making them suitable for many applications.

Graph‑to‑Tree Transformation : Converting a complex graph into a tree rooted at a user simplifies the visualization of a single user’s associations.

Relation Storage :

• MySQL – Separate tables for devices and phone numbers store (user, factor, timestamp) triples.

• Redis – Uses a key‑set structure similar to an adjacency list for fast lookup of associated entities.

• Graph Database – Provides native graph queries but may lack clustering and horizontal scaling (e.g., Neo4j limitations).

Noise Reduction – Apply expiration times, filter invalid factors, combine weak factors, limit association counts, and set hierarchical depth to improve precision.

Association Expansion – Weak factors (same IP, address) and behavior similarity can be leveraged to increase recall while controlling noise.

Association Graph Applications

Feature Mining – Centrality metrics (degree, closeness, betweenness, eigenvector) quantify a node’s importance; derived features include the number of associated factors per user and the number of users per factor.

Association Coloring – Customizable factor levels and coloring strategies allow automatic or manual labeling of suspicious accounts, with mechanisms for coloring removal and source tracking.

Backend Display – Visualize associated account information and paths (e.g., BFS‑derived user‑to‑user paths) in the operations console.

Data Analysis – Simple Hive joins can extract association pairs; GraphX’s connectedComponents can group connected subgraphs for further analysis.

Monitoring – Track large‑scale groups, association ratios, and request QPS contributions to detect abnormal spikes indicative of coordinated fraud.

Conclusion

Association graphs provide a powerful foundation for risk control; their effectiveness depends on aligning factor strength and architecture with specific business scenarios, balancing accuracy and recall.

SQL Example

select 
  distinct zzt2.用户uid as 用户A,
  zzt3.用户uid as touid As 用户B,
  zzt1.关联因子a as 关联因子
from 
  (select 关联因子a from   表1  group by 关联因子a having count(1)>=2 ) zzt1
JOIN
  (select 用户uid,关联因子a from  表1 ) zzt2
  ON
   zzt1.关联因子a=zzt2.关联因子a
JOIN
  (select 用户uid,关联因子a from  表1) zzt3
ON
   zzt1.关联因子a=zzt3.关联因子a

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

fraud detection Graph Database graph theory association graph

Written by

Zhuanzhuan Tech

A platform for Zhuanzhuan R&D and industry peers to learn and exchange technology, regularly sharing frontline experience and cutting‑edge topics. We welcome practical discussions and sharing; contact waterystone with any questions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.