Multi‑Task Learning for Sample Selection Bias in Financial Risk Control
This article presents a comprehensive study on addressing sample selection bias in credit risk modeling by applying multi‑task learning techniques, including MoE/MMoE, ESMM, hierarchical attention, and semi‑supervised loss, and demonstrates their effectiveness through two real‑world application cases and experimental results.
The presentation introduces the challenges of sample selection bias in financial risk control, where traditional models train on a small, biased subset of applicants while being deployed on the full applicant pool, leading to performance degradation.
It defines sample selection bias, illustrates its impact with visual examples, and explains why conventional tree‑based models (XGBoost, LightGBM) and even GNNs struggle to overcome this issue.
To mitigate bias without requiring explicit overdue labels, the authors propose leveraging multi‑task learning (MTL), treating auxiliary tasks such as approval/rejection and disbursement labels as additional supervision for the main overdue prediction task.
The article reviews common MTL approaches, including hard sharing, soft sharing with Mixture‑of‑Experts (MoE) and Multi‑Gate MoE (MMoE), and the ESMM method originally designed for conversion rate estimation, highlighting their relevance to bias correction.
A novel sequential MTL framework (MSIS) is introduced, featuring:
Shared bottom layers for common feature extraction.
Stage‑specific towers for application, disbursement, and risk stages.
An "info bridge" employing hierarchical attention to model intra‑stage and inter‑stage dependencies.
A semi‑supervised loss that combines labeled cross‑entropy with an entropy‑regularization term for unlabeled samples.
Two practical cases are described:
A recovery model that combines XGBoost leaf features with a neural network to jointly learn approval and overdue tasks, improving stability over a year of online deployment.
A transformer‑based text model that jointly predicts ten labels (including approval, disbursement intervals, and risk) and achieves ~2% AUC improvement per label.
Experimental results show the proposed MSIS method yields approximately 2% AUC gain on the risk stage compared to baseline models, with ablation studies confirming the contribution of each component and sensitivity analyses demonstrating robustness across hyper‑parameter settings.
The conclusion summarizes three key contributions: using low‑bias auxiliary tasks, designing a sequential multi‑task architecture with information bridges, and applying semi‑supervised regularization to further alleviate bias.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.