How Graph Algorithms Power Anti‑Fraud in Marketing and E‑Commerce
This article explores how black‑market cheating in marketing campaigns and e‑commerce is detected using graph‑based techniques such as same‑person mining, label propagation, Fraudar, and GCN models, and discusses future directions like multimodal data fusion and real‑time dynamic graph computation.
Business Background
In marketing campaigns, black‑market groups automate large‑scale reward harvesting, waste marketing funds, and pollute data metrics, leading to inaccurate operational decisions.
Typical cheating stages include resource preparation (mass‑purchasing accounts), task execution (automated scripts simulating user actions), and fund cash‑out (using dispersed real WeChat accounts).
Same‑Person Mining
We define a "same‑person" group as accounts and devices sharing the same underlying entity. By constructing a graph where user accounts are nodes and device IDs, phone numbers, withdrawal IDs, and encrypted ID cards are edges, we extract maximal connected subgraphs to identify same‑person relationships and apply stability processing across multiple days, achieving 96.8% overall ID stability and 99.3% after filtering unstable merges.
Label Propagation Algorithm and Application
The Label Propagation Algorithm (LPA) is a semi‑supervised graph‑based method for community detection and node classification. It iteratively spreads the most frequent neighbor label to each unlabeled node until convergence.
Advantages: computationally efficient (O(|E|) per iteration), no parameter tuning, naturally parallelizable.
Limitations: possible instability due to random initialization, ignores node features, performs poorly on sparse graphs.
Fraudar Algorithm and Application
Fraudar targets bipartite graphs (e.g., user‑shop). It defines a global suspiciousness score g(S)=f(S)/|S|, where f(S) sums node and edge suspiciousness. The algorithm iteratively removes the node with the lowest contribution, tracking g(S) to select the subgraph with maximal suspiciousness.
Key properties: higher node/edge suspiciousness increases g(S); larger dense subgraphs are more suspicious; smaller subgraphs with the same total suspiciousness are more suspicious.
Optimized with a priority heap, the total complexity becomes O(|E| log|V|). Multiple suspicious subgraphs are obtained by repeatedly removing the identified subgraph and re‑running Fraudar.
GCN Model and Application
GCN (Graph Convolutional Network) aggregates neighbor features to update node representations, enabling end‑to‑end learning on graph data. The core propagation formula is H^{(l+1)} = σ( H^{(l)} W^{(l)}), where  is the normalized adjacency with self‑loops.
Advantages: integrates graph structure with node features, efficient sparse matrix computation, suitable for node classification, graph classification, and link prediction.
Limitations: over‑smoothing in deep layers, assumes static graphs.
Summary and Outlook
We have deployed LPA, Fraudar, and GCN for risk control and anti‑fraud, achieving solid results. Future work includes multimodal data fusion (graph, time‑series, text), real‑time dynamic graph computation to counter rapid black‑market evolution, and enhancing interpretability and adversarial robustness.
Graph algorithms are evolving from auxiliary tools to the core engine of risk control systems, with ongoing challenges in balancing technical performance, compliance, and user experience.
Baidu Geek Talk
Follow us to discover more Baidu tech insights.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.