Artificial Intelligence 13 min read

Graph Algorithm Design and Optimization for Detecting Black‑Market Users in Virtual Networks

This article presents a comprehensive study on using graph representation learning, particularly GraphSAGE and its optimizations, to identify and mitigate black‑market accounts in virtual networks, covering background, algorithm design, handling isolated nodes and heterogeneity, and evaluation results.

Sohu Tech Products
Sohu Tech Products
Sohu Tech Products
Graph Algorithm Design and Optimization for Detecting Black‑Market Users in Virtual Networks

Virtual networks contain a subset of black‑market users who engage in illegal activities such as prostitution advertising, gambling, and even drug or weapon trafficking; controlling these accounts is essential for the healthy operation of Tencent services. Traditional tree‑based models and community‑division methods have limitations, prompting the use of graph representation learning combined with clustering to better discover malicious accounts.

Background and Goals – Detecting malicious accounts at the account level offers three main benefits: (1) it targets the source of black‑market activity for precise intervention; (2) user behavior is hard to alter quickly, and malicious content can evade detection through simple adversarial tactics; (3) early identification enables pre‑emptive control before illegal actions occur. The detection tasks include identifying target malicious categories and expanding from seed black‑market users via diffusion or similarity retrieval.

Graph Algorithm Design – Conventional tree models (e.g., XGBoost, GBDT) ignore relational information, while community‑division algorithms (FastUnfolding, Copra) suffer from small community sizes and limited recall. To overcome these shortcomings, graph representation learning maps node attributes and structural features into a low‑dimensional space, enabling downstream tasks such as node classification for account labeling.

GraphSAGE Deployment and Optimization – GraphSAGE’s core ideas are neighbor sampling and feature aggregation, where node features and sampled neighbor features undergo linear transformations, concatenation, and a final linear layer to produce embeddings used for downstream tasks (e.g., unsupervised NCE loss). Advantages include inductive learning and support for incremental features. However, GraphSAGE has drawbacks: inability to handle weighted graphs, randomness causing unstable embeddings, loss of local information due to limited sampling, and over‑smoothing in deep GCNs. Optimizations applied are:

Weighted aggregation – normalize edge weights and fuse neighbor features proportionally.

Pruning – keep only the top‑K weighted edges per node to stabilize embeddings and reduce memory.

Sampling improvement – use DGL’s sampling to retain more local information and avoid over‑smoothing when increasing model depth.

Isolated Points & Heterogeneity – Isolated nodes (cold‑start) hinder detection; EGES maps node attributes to embeddings via an attention layer, allowing new nodes to obtain meaningful embeddings without relational data. Combining GraphSAGE with EGES (GraphSAGE‑EGES) improves clustering accuracy by about 2 %. Heterogeneity arises because black‑market accounts connect to both malicious and normal users, confusing predictions. To address this, the graph structure is refined by removing noisy edges between malicious and normal accounts and strengthening connections among malicious accounts, using the LDS algorithm to jointly learn a more reasonable adjacency matrix and GCN parameters.

Effect Evaluation – Two metrics are used: clustering (community) accuracy and malicious recall rate. Compared with FastUnfolding and node2vec, the proposed GraphSAGE‑EGES approach achieves higher clustering accuracy, higher recall of malicious accounts, larger average community size, and reduced runtime.

Summary Thoughts – Key takeaways include the critical role of feature engineering and graph construction; the necessity to choose algorithms based on specific business scenarios (no one‑size‑fits‑all solution); and the preference for straightforward, production‑ready algorithms in industry deployments.

Thank you for your attention.

network securityrepresentation learninggraph algorithmsGraphSAGEblack market detection
Sohu Tech Products
Written by

Sohu Tech Products

A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.