Artificial Intelligence 19 min read

Tail Traffic Modeling and Data‑Driven Risk Strategies at 360 Shuke

This article presents 360 Shuke's practical approach to modeling low‑volume (tail) credit traffic using accumulated data, covering the characteristics of tail traffic, sample expansion under low approval rates, timeliness‑based data clustering, and ranking optimization for high‑quality head customers.

DataFunTalk

Jan 2, 2023

Tail Traffic Modeling and Data‑Driven Risk Strategies at 360 Shuke

Overview: The article introduces 360 Shuke's practice of tail‑traffic modeling based on accumulated data, divided into four parts: characteristics of tail traffic and accumulated data, sample expansion under low approval rates, timeliness‑based clustering of accumulated data, and ranking optimization for high‑quality head customers.

1. Characteristics of Tail Traffic and Accumulated Data: In the current market, credit traffic growth has entered a saturation stage, making traffic expensive. Institutions are now trying to recover tail customers previously abandoned. Challenges include high risk, difficulty in using mainstream multi‑source data, low loan amounts, and severe data missingness for tail customers. Reasons to pursue tail customers are high acquisition cost, improving risk control capabilities, low‑cost credit building for new customers, and reactivating dormant customers.

The platform's tail customers come from restricted accounts, low‑amount new accounts, rejected credit, dormant accounts, and high‑risk segments identified by funding partners.

2. Sample Expansion under Low Approval Rates: When Y‑samples are scarce, three methods are used to enlarge the dataset.

Co‑existence Fusion Labels: Leverage risk performance of the same user on other products within the same time window, merging labels to increase sample size by 3‑4×.

Relaxed Bad‑User Definition: Extend the delinquency threshold (e.g., from 30 days to 28‑29 days) to include more samples when the rolling rate remains high.

Long‑ and Short‑Term Indicators: Use multiple Y‑labels (1‑month, 3‑month, 6‑month) simultaneously; longer‑term labels often provide better discrimination.

3. Timeliness‑Based Clustering of Accumulated Data: Customers are grouped by the recency of their activity in other products relative to the credit‑application timestamp (T0). Trade 1: activity within 30 days, Trade 2: 30‑90 days, Trade 3: >90 days. This clustering reveals that newer data (Trade 1) does not always yield the best model performance; sometimes older data (Trade 2) performs better, justifying the need for timeliness‑driven segmentation.

Evaluation shows that models built on each trade segment improve performance compared with a benchmark built on the whole sample, and the improvement can be attributed to both model enhancement and effective segmentation.

4. Ranking Optimization for High‑Quality Head Customers under Low Approval Rates: The gap between modeling samples (often high approval) and production samples (10‑20 % approval) degrades performance. Methods to improve top‑capture include:

Stacked models: build a global model (Model 1), select top‑% samples, train a second model (Model 2) on this subset, then fuse both.

Weighting: duplicate or assign higher weights to top‑% samples before retraining.

Loss‑function adjustments: use a‑balanced cross‑entropy, focal loss, or a combination to penalize mis‑ranking of top bad samples.

Q&A Session: Discussed alternative weighting methods (e.g., XGBoost's scale_pos_weight, sample matrix weighting, loss‑function modification), details of co‑existence label fusion, necessity of label fusion after clustering, handling of customer drift, and the business impact of tail customers.

Thank you for attending; the session recording is available via the provided QR code.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

model optimization Data Clustering risk modeling sample expansion tail traffic

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.