Building Yanxuan Machine Learning Platform: Architecture and Implementation
Yanxuan built a Kubeflow‑based machine‑learning platform that unifies data preprocessing, feature engineering, model training, validation, and deployment, using Smart‑jobs, Smart‑Infer, Smart‑backend, Airflow pipelines, Jupyter notebooks, and Istio‑enhanced inference services to boost algorithm engineers’ efficiency and integrate with Kubernetes, HDFS, and Hive.
This article introduces the construction of Yanxuan's machine learning platform, which was developed to address the growing importance of deep learning algorithms in business and the need to improve algorithm engineers' development efficiency. The platform was built on top of Kubeflow and provides comprehensive support for the entire machine learning workflow.
The article covers the business background, explaining how Yanxuan's algorithm development was previously in a "primitive" state with manual scheduling and execution on physical machines or VMs. It then details the algorithm development workflow, which includes data preprocessing, feature engineering, model training, model validation, and model deployment.
The platform's overall architecture is presented, showing how it integrates with Kubernetes clusters and separates offline training from online inference. Key components include Smart-jobs for managing training tasks, Smart-Infer for managing inference services, and Smart-backend as the unified backend interface.
The article provides detailed explanations of different modules including the development environment (Jupyter Notebook), training environment (using Kubeflow's training operators), model management, and inference services. The inference service architecture evolved from native Kfserving to a microservices approach using Istio for better performance.
Pipeline orchestration is handled using Airflow, with custom operators for different tasks. The platform also includes features like feature engineering integration, monitoring and visualization, and file system integration with HDFS and Hive.
The article concludes with future plans to further integrate the feature platform with the algorithm platform and improve real-time capabilities.
NetEase Yanxuan Technology Product Team
The NetEase Yanxuan Technology Product Team shares practical tech insights for the e‑commerce ecosystem. This official channel periodically publishes technical articles, team events, recruitment information, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.