Artificial Intelligence 13 min read

Applying Graph Algorithms and Graph Convolutional Networks for Advertising Anti‑Fraud at 58.com

This article explains how various graph algorithms—including connected components, label propagation, Louvain community detection, and Graph Convolutional Networks—are built on large‑scale user‑behavior graphs using Spark GraphX to detect and mitigate advertising fraud, detailing methodology, implementation, and experimental results.

DataFunTalk
DataFunTalk
DataFunTalk
Applying Graph Algorithms and Graph Convolutional Networks for Advertising Anti‑Fraud at 58.com

Graphs model relationships between entities such as users, IPs, and devices; vertices represent objects while edges capture their interactions. Originating from Euler's Königsberg problem, graph theory has evolved into a foundation for many algorithms.

In advertising anti‑fraud, real‑world graphs (e.g., road networks, social networks) are constructed from behavior data, enabling detection of coordinated cheating groups.

Four key graph algorithms are introduced:

Connected subgraph (component) detection to find clusters of tightly linked nodes.

Label propagation for fast semi‑supervised community detection.

Louvain algorithm, which optimizes modularity for hierarchical clustering.

Graph Convolutional Networks (GCN) that combine structural information with node features via neural message passing.

The anti‑fraud workflow at 58.com first builds a bipartite graph from user actions (e.g., shared IPs). Unsupervised algorithms (connected components, label propagation, Louvain) generate clusters, which are filtered and manually verified to produce high‑quality black/white labels.

These labels serve as training data for a two‑layer GCN model (with dropout, ReLU activation, ADAM optimizer, and cross‑entropy loss). The GCN ingests the adjacency matrix and node feature matrix (53‑dimensional) for ~403k vertices and 3.12M edges, achieving 98.9% training accuracy and 98.2% test accuracy.

Experimental comparison shows that GCN outperforms a baseline XGBoost model and other unsupervised methods, delivering over 20% relative recall improvement while maintaining >95% precision.

Future work includes incorporating richer behavioral dimensions, edge weighting, and further tuning of the underlying graph algorithms to enhance detection robustness.

anti-fraudGCNlabel propagationgraph algorithmsLouvainSpark GraphXconnected components
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.