ZhongAn Financial Real‑Time Feature Platform: MLOps Practices, Architecture and Anti‑Fraud Applications
This article presents ZhongAn Financial’s end‑to‑end MLOps workflow and real‑time feature platform architecture, detailing team roles, data pipelines, Flink‑based processing, TableStore storage, anti‑fraud feature design, and answers to common implementation questions, offering a comprehensive guide for building scalable, low‑latency ML services in finance.
Introduction – With digital transformation, enterprises face multi‑scenario, multi‑channel, and high‑frequency data demands; real‑time data processing becomes a competitive edge. The article introduces ZhongAn Financial’s real‑time feature platform as a case study.
1. ZhongAn Financial MLOps Overview
MLOps integrates machine learning, data engineering and DevOps to enable rapid model iteration and stable production deployment. Four collaborative teams are defined: data product, data engineering, data science, and data application.
The MLOps workflow includes sample preparation, data cleaning, feature engineering (using methods such as WoE, one‑hot, binning), model development, training, deployment, and continuous monitoring.
Reasons for building MLOps: ZhongAn provides credit‑insurance for online consumer loans, requiring precise risk identification, large‑scale model usage, and real‑time feature availability.
2. System Architecture of the Real‑Time Feature Platform
Key components: a big‑data platform for offline data, a feature‑engineering layer, a machine‑learning platform, and a real‑time feature service that registers features and serves them via APIs.
Data sources are categorized into transaction behavior, third‑party credit data, device capture data, and user‑behavior logs. Core capabilities include data ingestion, high‑throughput feature computation, and low‑latency response, realized with Flink as the streaming engine and Alibaba Cloud TableStore as the storage backend.
The business architecture consists of a feature‑gateway layer, a core processing layer, and an application layer. Four data sources feed the gateway: credit‑data gateway, third‑party data platform, real‑time computation platform, and offline call platform. Core functions cover feature gateway, configurable feature registration, feature computation (including third‑party, real‑time, anti‑fraud, and model features), feature management, and monitoring.
3. Real‑Time Business Feature Computation Details
Two processing models were evaluated: ETL (synchronizing raw data then computing features) and ELT (synchronizing raw data and computing features on‑demand). The ELT approach was chosen for flexibility and resource efficiency.
Data flow uses Kafka + Flink for streaming and Spark for historical back‑fill, with TableStore storing business detail data and Redis storing anti‑fraud features. An ID‑Mapping table unifies user identifiers (ID card, phone) for feature lookup.
The feature computation engine, built with Groovy expression language, supports configurable feature pipelines, enabling rapid feature rollout without code changes.
4. Anti‑Fraud Feature Application
Feature categories include user‑behavior, location‑identification, device‑association, user‑graph relationships, and community features. Real‑time anti‑fraud features are computed in Flink, stored in Redis or a graph database, and served via HTTP APIs.
The user‑relationship graph is built from identity, device, and contact data; NebulaGraph is selected for its sharding‑free distributed architecture. Graph‑based fraud signals such as first‑party fraud, intermediary fraud, information leakage, and organized fraud are extracted.
5. Q&A Highlights
Answers cover topics such as multi‑Kafka message correlation, performance guarantees for high‑dimensional feature queries, unsupervised anomaly detection on app‑tracking data, graph community detection (using connected‑component algorithm via Spark GraphX), latency bottlenecks, consistency between online and offline pipelines, and fallback strategies when real‑time features are missing or slow.
Overall, the presentation demonstrates how a financial institution can design, implement, and operate a scalable MLOps‑driven real‑time feature platform that supports both risk control and anti‑fraud use cases.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.