Artificial Intelligence 15 min read

Evolution and Architecture of a Financial Risk Control System: From Monolith to Microservices and Commercialization

This article details the design, refactoring, performance optimization, reliability monitoring, and commercialization of a financial risk control system, covering rule abstraction, decision workflows, feature engineering, model integration, and the trade‑offs between latency and accuracy in large‑scale production environments.

DataFunTalk
DataFunTalk
DataFunTalk
Evolution and Architecture of a Financial Risk Control System: From Monolith to Microservices and Commercialization

01 Phase 1: Risk Service Evolution

Early risk system was a large monolithic application containing rule decision, workflow configuration, model calculation, data integration and feature processing, leading to issues such as low change efficiency, duplicated code, high failure rate, and difficulty supporting multiple business lines.

1. Early System Problems

Low change efficiency, long cycles

Fragmented, repetitive requirements

Severe code coupling

High production fault rate

Inability to meet multi‑business, multi‑scenario needs

2. Refactoring Abstraction

We abstracted the risk engine into six elements—rule, decision, workflow, model, feature, and data—allowing separation of rule logic from code via a domain‑specific language (DSL) and enabling visual configuration.

Rule Abstraction

Rules are expressed as feature‑operator‑threshold triples and compiled into executable DSL text, which can be parsed by engines such as Drools, Groovy, QlExpress, etc.

Combined Decision

Multiple rules can be grouped into rule sets, decision trees, matrices, or tables, providing conflict resolution and hierarchical decision flow.

Workflow Orchestration

Decision flows are composed of sequential and branch nodes (e.g., A/B testing), executed via pipeline or Rete algorithms; we adopt pipeline for simplicity and performance.

Visualization

A visual console lets risk experts adjust rules and flows without developer intervention, storing configurations in relational databases and converting them to DSL at publish time.

Model Integration

Machine‑learning models (online prediction and offline training) complement rule‑based decisions, forming a closed‑loop model lifecycle.

Data Features

Features are derived from internal (first‑party) and external (third‑party) data sources, processed by a feature engine that resolves dependencies via DAG execution.

02 Phase 2: Performance and Reliability

Growth in decision volume and scenario diversity introduced challenges in latency, accuracy, and system stability.

1. New Issues

Increased decision calls and data dimensions demand higher performance; different scenarios require trade‑offs between speed (seconds) and precision (minutes).

2. Decision Timeliness vs Accuracy

Real‑time decisions prioritize speed with tolerant data failures; near‑real‑time decisions allow retries and longer latency for higher accuracy.

3. Feature Computation Strategies

We employ real‑time, pre‑computation, batch, and hybrid calculations to balance latency and correctness, using CDC + Kafka + Flink for efficient pre‑computation.

4. Reliability Monitoring

A multi‑layer monitoring system (business, application, system, tracing) provides real‑time dashboards, alerts on rule hit rates, feature missing rates, and decision pass rates, and supports traffic replay for offline testing.

03 Phase 3: Commercialization

After two years of production refinement, the platform is packaged as a SaaS offering (“Magic Cube”), supporting localized deployment, extensibility, and delivering risk‑control capabilities to external customers.

system architectureBig Datamachine learningFeature Engineeringrisk controldecision engine
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.