Artificial Intelligence 18 min read

Advanced Rule Learning, Constraint‑Adaptive Frameworks, and Semi‑Supervised Data Augmentation for Fraud Detection and Imbalanced Ranking

This article surveys recent Ant Group research on explainable fraud detection, including constraint‑adaptive rule‑set learning (CRSL), meta‑path guided rule generation (MetaRule), biased sampling for imbalanced ranking, and a semi‑supervised data‑augmentation framework (SDAT) for tabular data, highlighting their motivations, methodologies, deployments, and experimental results.

AntTech
AntTech
AntTech
Advanced Rule Learning, Constraint‑Adaptive Frameworks, and Semi‑Supervised Data Augmentation for Fraud Detection and Imbalanced Ranking

The rapid growth of the digital economy has increased both transaction volume and the prevalence of illicit activities such as fraud, cash‑out, and gambling, creating challenges for reliable detection that requires strong model interpretability, constraint‑aware objectives, and handling of severely imbalanced data.

To address these challenges, Ant Group’s Machine Intelligence team contributed four papers accepted at CIKM2022:

Constraint‑Adaptive Rule‑Set Learning (CRSL): A framework that jointly optimizes rule mining, rule ranking, and rule subset selection under confidence and coverage constraints, using a constraint‑aware rule generation algorithm (CARM) and a Bayesian rule combination method (CBRS). Deployed in Alipay’s fraud decision center, it improves coverage by 3‑15% across 60+ risk‑analysis scenarios.

MetaRule: A meta‑path guided ensemble rule‑set learning approach that transforms rule generation into a path‑extraction problem on a decision‑condition graph, capturing high‑order semantic relations via neural representation learning.

Biased Sampling for Imbalanced Personalized Ranking: A two‑stage negative‑sample selection method that leverages node degree and popularity to perform biased random walks, dynamically adjusting sample weights to mitigate the adverse effects of graph sparsity on ranking models.

Semi‑Supervised Learning with Data Augmentation for Tabular Data (SDAT): A VAE‑based augmentation pipeline combined with consistency regularization, designed specifically for tabular features where traditional image/text augmentations are unsuitable.

Each method is described in detail, including theoretical analysis, algorithmic design, and extensive experiments on benchmark and large‑scale industrial datasets that demonstrate superior performance and competitive interpretability compared with existing techniques.

The CRSL framework has been integrated into an end‑to‑end fraud decision workflow, providing an interactive interface for risk analysts to refine and deploy rule sets, while the other three methods contribute to the broader AI toolbox for handling data imbalance, explainability, and semi‑supervised learning in high‑risk financial applications.

data augmentationfraud detectionAI researchGraph Neural NetworksSemi-supervised Learningrule learningconstraint adaptive
AntTech
Written by

AntTech

Technology is the core driver of Ant's future creation.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.