Artificial Intelligence 14 min read

Machine Learning Platform and Risk‑Control Applications at DianRong Net

The article presents a comprehensive overview of DianRong Net's in‑house machine‑learning platform built on Spark, its workflow, pain points it addresses, risk‑control case studies using graph mining, and practical tips for improving model performance through data, algorithms, hyper‑parameter tuning and ensemble methods.

High Availability Architecture
High Availability Architecture
High Availability Architecture
Machine Learning Platform and Risk‑Control Applications at DianRong Net

The talk introduces DianRong Net's machine‑learning platform, describing its three‑part structure: the platform/framework itself, risk‑control business case analyses, and model‑performance improvement techniques.

Machine‑Learning Platform – A typical ML pipeline starts with dataset splitting into training and test sets, preprocessing (handling missing values, correlation analysis, distribution checks), feature importance analysis, model selection, hyper‑parameter tuning, and finally obtaining the best model. Existing commercial solutions suffer from high licensing costs, data‑security concerns, limited visualization, lack of distributed processing, and cumbersome model deployment.

To overcome these issues, DianRong built a custom platform on a Spark cluster, adding capabilities such as HDFS data access, visual data exploration, feature‑importance ranking, collinearity analysis, and a model library that includes both classic ML algorithms and deep‑learning models running on a dedicated GPU server. Models can be published with a one‑click button that generates a RESTful prediction service.

Risk‑Control Business Cases – The platform is applied to credit‑risk assessment, converting heterogeneous user data (bank cards, employment, emails, loans) into a graph database. Graph‑based classification, sub‑graph analysis, and unsupervised learning (smoothness, clustering, manifold assumptions) are used to predict good or bad applicants. Community‑detection algorithms help identify clusters with a high concentration of risky users.

Improving Model Performance – Four practical ways are suggested: (1) enrich and clean data to obtain better features; (2) try more powerful algorithms such as non‑linear or boosting models; (3) conduct thorough hyper‑parameter tuning, especially for deep‑learning models; (4) employ ensemble techniques (e.g., random forest, GBDT) and stack multiple models.

The session ends with a Q&A covering graph‑to‑table conversion, distance metrics for clustering, and the relevance of machine learning for small‑to‑medium enterprises.

Big DataMachine Learningmodel optimizationrisk controlSparkgraph mining
High Availability Architecture
Written by

High Availability Architecture

Official account for High Availability Architecture.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.