Managing the Full Lifecycle of Risk Features: Pitfalls, Solutions, and Future Directions
The talk by Tang Gengyang from Citic Baixin Bank details the challenges faced in risk feature engineering, presents two solution frameworks (1.0 and 2.0) for accelerating deployment, improving reuse, handling offline/online consistency, and outlines future enhancements for a more efficient, automated feature pipeline.
Speaker and Context Tang Gengyang, a senior data engineer at Citic Baixin Bank, shared his experience in building and managing credit‑risk features, covering the entire feature lifecycle from design to deprecation.
1. Pitfalls Encountered Early feature development relied on custom Java operators, which became a bottleneck as third‑party data sources grew. Problems identified include slow rollout, poor reusability, lack of decommissioning processes, and insufficient monitoring.
2. Solution 1.0 – Rapid Fixes Using existing resources, the team standardized offline and real‑time data schemas, aligned requirements between product and engineering, minimized language transformations (SQL/Python instead of Java), and built a unified data source using a Lambda‑style architecture with Flink. This reduced feature rollout time by roughly 50% and streamlined testing.
3. Solution 2.0 – Systematic Lifecycle Management A comprehensive feature factory was introduced to manage registration, naming conventions, duplication detection, and strict de‑registration approvals. Asynchronous processing was adopted via Kafka topics, an event‑driven scheduler, and a notification center to decouple feature computation from consumption, improving throughput and reliability.
4. Future Improvements The roadmap aims for higher, faster, stronger capabilities: self‑service feature development with parameterized SQL, automated feature evaluation using models (e.g., XGBoost) to rank importance, and a closed‑loop platform that integrates mining, deployment, and analysis to accelerate model and strategy iteration.
Overall, the session highlighted practical lessons, concrete engineering solutions, and a vision for more automated, scalable risk feature management.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.