Artificial Intelligence 19 min read

Tail Traffic Modeling and Data‑Driven Risk Strategies at 360 Shuke

This article presents 360 Shuke's practical approach to modeling low‑volume (tail) credit traffic using accumulated data, covering the characteristics of tail traffic, sample expansion under low approval rates, timeliness‑based data clustering, and ranking optimization for high‑quality head customers.

DataFunTalk
DataFunTalk
DataFunTalk
Tail Traffic Modeling and Data‑Driven Risk Strategies at 360 Shuke

Overview: The article introduces 360 Shuke's practice of tail‑traffic modeling based on accumulated data, divided into four parts: characteristics of tail traffic and accumulated data, sample expansion under low approval rates, timeliness‑based clustering of accumulated data, and ranking optimization for high‑quality head customers.

1. Characteristics of Tail Traffic and Accumulated Data: In the current market, credit traffic growth has entered a saturation stage, making traffic expensive. Institutions are now trying to recover tail customers previously abandoned. Challenges include high risk, difficulty in using mainstream multi‑source data, low loan amounts, and severe data missingness for tail customers. Reasons to pursue tail customers are high acquisition cost, improving risk control capabilities, low‑cost credit building for new customers, and reactivating dormant customers.

The platform's tail customers come from restricted accounts, low‑amount new accounts, rejected credit, dormant accounts, and high‑risk segments identified by funding partners.

2. Sample Expansion under Low Approval Rates: When Y‑samples are scarce, three methods are used to enlarge the dataset.

Co‑existence Fusion Labels: Leverage risk performance of the same user on other products within the same time window, merging labels to increase sample size by 3‑4×.

Relaxed Bad‑User Definition: Extend the delinquency threshold (e.g., from 30 days to 28‑29 days) to include more samples when the rolling rate remains high.

Long‑ and Short‑Term Indicators: Use multiple Y‑labels (1‑month, 3‑month, 6‑month) simultaneously; longer‑term labels often provide better discrimination.

3. Timeliness‑Based Clustering of Accumulated Data: Customers are grouped by the recency of their activity in other products relative to the credit‑application timestamp (T0). Trade 1: activity within 30 days, Trade 2: 30‑90 days, Trade 3: >90 days. This clustering reveals that newer data (Trade 1) does not always yield the best model performance; sometimes older data (Trade 2) performs better, justifying the need for timeliness‑driven segmentation.

Evaluation shows that models built on each trade segment improve performance compared with a benchmark built on the whole sample, and the improvement can be attributed to both model enhancement and effective segmentation.

4. Ranking Optimization for High‑Quality Head Customers under Low Approval Rates: The gap between modeling samples (often high approval) and production samples (10‑20 % approval) degrades performance. Methods to improve top‑capture include:

Stacked models: build a global model (Model 1), select top‑% samples, train a second model (Model 2) on this subset, then fuse both.

Weighting: duplicate or assign higher weights to top‑% samples before retraining.

Loss‑function adjustments: use a‑balanced cross‑entropy, focal loss, or a combination to penalize mis‑ranking of top bad samples.

Q&A Session: Discussed alternative weighting methods (e.g., XGBoost's scale_pos_weight , sample matrix weighting, loss‑function modification), details of co‑existence label fusion, necessity of label fusion after clustering, handling of customer drift, and the business impact of tail customers.

Thank you for attending; the session recording is available via the provided QR code.

model optimizationdata clusteringrisk modelingsample expansiontail traffic
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.