Machine Learning Platform and Risk‑Control Applications at DianRong Net
The article presents a comprehensive overview of DianRong Net's in‑house machine‑learning platform built on Spark, its workflow, pain points it addresses, risk‑control case studies using graph mining, and practical tips for improving model performance through data, algorithms, hyper‑parameter tuning and ensemble methods.
The talk introduces DianRong Net's machine‑learning platform, describing its three‑part structure: the platform/framework itself, risk‑control business case analyses, and model‑performance improvement techniques.
Machine‑Learning Platform – A typical ML pipeline starts with dataset splitting into training and test sets, preprocessing (handling missing values, correlation analysis, distribution checks), feature importance analysis, model selection, hyper‑parameter tuning, and finally obtaining the best model. Existing commercial solutions suffer from high licensing costs, data‑security concerns, limited visualization, lack of distributed processing, and cumbersome model deployment.
To overcome these issues, DianRong built a custom platform on a Spark cluster, adding capabilities such as HDFS data access, visual data exploration, feature‑importance ranking, collinearity analysis, and a model library that includes both classic ML algorithms and deep‑learning models running on a dedicated GPU server. Models can be published with a one‑click button that generates a RESTful prediction service.
Risk‑Control Business Cases – The platform is applied to credit‑risk assessment, converting heterogeneous user data (bank cards, employment, emails, loans) into a graph database. Graph‑based classification, sub‑graph analysis, and unsupervised learning (smoothness, clustering, manifold assumptions) are used to predict good or bad applicants. Community‑detection algorithms help identify clusters with a high concentration of risky users.
Improving Model Performance – Four practical ways are suggested: (1) enrich and clean data to obtain better features; (2) try more powerful algorithms such as non‑linear or boosting models; (3) conduct thorough hyper‑parameter tuning, especially for deep‑learning models; (4) employ ensemble techniques (e.g., random forest, GBDT) and stack multiple models.
The session ends with a Q&A covering graph‑to‑table conversion, distance metrics for clustering, and the relevance of machine learning for small‑to‑medium enterprises.
High Availability Architecture
Official account for High Availability Architecture.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.