Big Data 14 min read

Managing the Full Lifecycle of Risk Features: Pitfalls, Solutions, and Future Directions

The talk by Tang Gengyang from Citic Baixin Bank details the challenges faced in risk feature engineering, presents two solution frameworks (1.0 and 2.0) for accelerating deployment, improving reuse, handling offline/online consistency, and outlines future enhancements for a more efficient, automated feature pipeline.

DataFunSummit

Aug 25, 2022

Managing the Full Lifecycle of Risk Features: Pitfalls, Solutions, and Future Directions

Speaker and Context Tang Gengyang, a senior data engineer at Citic Baixin Bank, shared his experience in building and managing credit‑risk features, covering the entire feature lifecycle from design to deprecation.

1. Pitfalls Encountered Early feature development relied on custom Java operators, which became a bottleneck as third‑party data sources grew. Problems identified include slow rollout, poor reusability, lack of decommissioning processes, and insufficient monitoring.

2. Solution 1.0 – Rapid Fixes Using existing resources, the team standardized offline and real‑time data schemas, aligned requirements between product and engineering, minimized language transformations (SQL/Python instead of Java), and built a unified data source using a Lambda‑style architecture with Flink. This reduced feature rollout time by roughly 50% and streamlined testing.

3. Solution 2.0 – Systematic Lifecycle Management A comprehensive feature factory was introduced to manage registration, naming conventions, duplication detection, and strict de‑registration approvals. Asynchronous processing was adopted via Kafka topics, an event‑driven scheduler, and a notification center to decouple feature computation from consumption, improving throughput and reliability.

4. Future Improvements The roadmap aims for higher, faster, stronger capabilities: self‑service feature development with parameterized SQL, automated feature evaluation using models (e.g., XGBoost) to rank importance, and a closed‑loop platform that integrates mining, deployment, and analysis to accelerate model and strategy iteration.

Overall, the session highlighted practical lessons, concrete engineering solutions, and a vision for more automated, scalable risk feature management.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Flink feature engineering asynchronous processing data pipelines risk features

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.