Interpretability of Deep Learning and Low‑Frequency Event Learning in Financial Applications
The article reviews the limitations of mainstream deep‑learning models in finance, proposes hybrid tree‑based and Wide&Deep architectures combined with attention, sensitivity and variance analysis to improve interpretability and low‑frequency event detection, and validates the approach with a large‑scale insurance recommendation case study.
This article, based on a talk by a data scientist from Ping An Life at the DataFunTalk algorithm salon, outlines the challenges of applying mainstream deep‑learning models—CNNs, RNNs, and DNNs—to financial problems, where scarce prior knowledge, low‑frequency events, and feature sparsity hinder performance and interpretability.
To address these challenges, the authors focus on two key research directions: model interpretability (through local feature exploration and sensitivity analysis) and low‑frequency event learning. They discuss how tree‑based models, despite limited fitting capacity, provide valuable explanations and can be combined with deep networks in hybrid Wide&Deep architectures.
The proposed solution leverages tree models for local feature extraction, Wide&Deep (or Conditional Multi‑Fields DNN) for joint training of sparse and dense data, and attention mechanisms to capture rare but important samples. Feature selection is performed using rule‑based scoring and Auto‑Encoder compression to reduce dimensionality before feeding into the deep network.
Sensitivity analysis methods—including variance analysis, Sobol indices, and Gaussian Process modeling—are employed to quantify how input perturbations affect outputs, offering a probabilistic view of model behavior suitable for financial risk assessment.
For low‑frequency event detection, the authors design a custom attention‑based algorithm (“Low Frequent Events Detection with attention mechanism”) that emphasizes rare samples without relying on extensive prior knowledge, integrating scaled dot‑product and multi‑head attention into the MLP pipeline.
The training pipeline consists of three asynchronous modules: the base MLP, a discriminator that flags mis‑classified samples, and a memory‑core (MC) module that refines those samples using the attention‑enhanced representations. Multiple loss functions guide the joint optimization, and careful threshold selection ensures stable convergence.
A case study on an internal life‑insurance recommendation task (1.7 million samples, binary classification) demonstrates that the hybrid model improves overall accuracy from 77.06 % to 82.35 %, with the discriminator achieving 91.04 % correctness on error detection and the MC module correcting approximately 4.1 % of the total samples.
The authors conclude that low‑frequency event mining must be context‑aware, requiring sufficient event frequency, distinct feature patterns, and a strong base model; nevertheless, the proposed architecture offers advantages over pure attention models, including targeted learning, asynchronous training stability, and enhanced generalization through memory‑based feature propagation.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.