Big Data 14 min read

Managing the Full Lifecycle of Risk Features: Pitfalls, Solutions, and Future Directions

The talk by Tang Gengyang from Citic Baixin Bank details the challenges faced in risk feature engineering, presents two solution frameworks (1.0 and 2.0) for accelerating deployment, improving reuse, handling offline/online consistency, and outlines future enhancements for a more efficient, automated feature pipeline.

DataFunSummit
DataFunSummit
DataFunSummit
Managing the Full Lifecycle of Risk Features: Pitfalls, Solutions, and Future Directions

Speaker and Context Tang Gengyang, a senior data engineer at Citic Baixin Bank, shared his experience in building and managing credit‑risk features, covering the entire feature lifecycle from design to deprecation.

1. Pitfalls Encountered Early feature development relied on custom Java operators, which became a bottleneck as third‑party data sources grew. Problems identified include slow rollout, poor reusability, lack of decommissioning processes, and insufficient monitoring.

2. Solution 1.0 – Rapid Fixes Using existing resources, the team standardized offline and real‑time data schemas, aligned requirements between product and engineering, minimized language transformations (SQL/Python instead of Java), and built a unified data source using a Lambda‑style architecture with Flink. This reduced feature rollout time by roughly 50% and streamlined testing.

3. Solution 2.0 – Systematic Lifecycle Management A comprehensive feature factory was introduced to manage registration, naming conventions, duplication detection, and strict de‑registration approvals. Asynchronous processing was adopted via Kafka topics, an event‑driven scheduler, and a notification center to decouple feature computation from consumption, improving throughput and reliability.

4. Future Improvements The roadmap aims for higher, faster, stronger capabilities: self‑service feature development with parameterized SQL, automated feature evaluation using models (e.g., XGBoost) to rank importance, and a closed‑loop platform that integrates mining, deployment, and analysis to accelerate model and strategy iteration.

Overall, the session highlighted practical lessons, concrete engineering solutions, and a vision for more automated, scalable risk feature management.

Big DataFlinkFeature Engineeringasynchronous processingdata pipelinesrisk features
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.