Overview of the WPAI AI Platform Architecture and Implementation
The article presents a comprehensive overview of the WPAI (Wuba Platform of AI) architecture, detailing its machine‑learning and deep‑learning components, feature‑engineering framework, distributed training pipelines, online prediction services, and deployment on Kubernetes‑managed GPU/CPU resources to accelerate AI applications across 58.com business lines.
Introduction AI is driving industry transformation, and to accelerate AI adoption in 58.com’s services, a unified AI algorithm platform (WPAI) was built to improve algorithm development efficiency, supporting both deep learning and traditional machine learning workflows.
Overall Architecture The platform abstracts feature engineering, model training, and online prediction into a web system, offering both Spark MLib/DMLC‑based machine‑learning and TensorFlow/Caffe‑based deep‑learning capabilities, and serves recommendation, search, advertising, image/text recognition, intelligent客服, and outbound call applications.
Machine Learning Platform
The ML platform provides three core functions—feature engineering, offline training, and online prediction—built on Hadoop, Spark, Yarn, Kubernetes, Docker, HDFS, and MySQL. Users can create tasks via a visual web UI, and models are packaged as Docker images and scheduled by Kubernetes pods.
The workflow moves from raw samples through data preprocessing, feature extraction, offline model training (XGBoost, FM, LR, etc.) on Spark MLib/DMLC, evaluation, and finally online serving via SCF services that load the model and feature mappings from HDFS.
Feature Engineering A unified framework abstracts seven common feature‑extraction functions (one‑hot hashing, one‑hot int, one‑hot enum, discretization, cross‑features, no‑discretization, sentence‑embedding with bag‑of‑words) and generates id:local:value formatted data that is later converted to libsvm format.
Model Training Large‑scale distributed training is achieved with Spark MLib/DMLC, supporting XGBoost, FM, LR, etc. Users configure models via the web UI, Docker‑encapsulated training jobs are scheduled by Kubernetes, and training logs are collected for real‑time monitoring.
Online Prediction (ML) The service stores model and feature index files in HDFS, configuration in MySQL, and provides SCF‑based APIs for scoring. It supports both feature‑only and full‑sample prediction paths.
Deep Learning Platform
The DL platform unifies GPU/CPU resource management via Kubernetes, supports TensorFlow (single‑node and distributed) and Caffe, and offers online serving through TensorFlow‑Serving, gRPC, and a custom SCF framework.
Offline Training (DL) Users submit TensorFlow jobs via the web UI; Docker images with specific TensorFlow versions run on Kubernetes pods, leveraging a private Ubuntu apt mirror and private PyPI repository for dependencies.
Online Prediction (DL) A generic SCF service receives requests, dynamically loads user‑provided JAR parsers, forwards data to TensorFlow‑Serving or custom gRPC services running in Docker containers, and returns results after re‑parsing, achieving model‑agnostic serving.
Summary The WPAI platform integrates large‑scale model building, unified GPU/CPU scheduling, and end‑to‑end AI workflows, supporting over 300 offline models and 80 online services with daily request volumes exceeding 2.5 billion, and continues to evolve for broader algorithmic support.
References 58同城人工智能平台架构实践
Recruitment: 58 AI Lab is hiring algorithm and backend engineers (email: [email protected]). See the AI Lab recruitment notice .
58 Tech
Official tech channel of 58, a platform for tech innovation, sharing, and communication.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.