Artificial Intelligence 15 min read

Design and Implementation of Qunar's Algorithm Service Platform for Machine Learning

The article describes the background, design, key components, and current status of Qunar's algorithm service platform, which provides a unified, scalable, and automated environment for feature engineering, model training, deployment, monitoring, and management of machine‑learning projects within the company's large‑accommodation division.

Qunar Tech Salon
Qunar Tech Salon
Qunar Tech Salon
Design and Implementation of Qunar's Algorithm Service Platform for Machine Learning

In recent years, rapid advances in machine learning have increased its adoption in the internet industry, exposing challenges such as development standards, business boundary definition, feature data management, model training, and deployment. Qunar's large‑accommodation division created the Algorithm Service Platform to address these issues and improve development efficiency.

The platform, started in August 2016, aims to provide a visual management interface, standardized development guidelines, and unified workflow for machine‑learning projects, allowing algorithm engineers to focus on feature engineering and model training while business engineers can integrate services via simple APIs.

Since its inception, the platform has undergone three major iterations: the first version solved large‑scale model deployment, the second added feature pipeline management and sharing, and the current version emphasizes automated training, model deployment, and rapid iteration.

Design-wise, the platform consists of two main parts: the Algorithm Service Platform and the Model Training Platform. The workflow includes feature engineering, model training (often using Jupyter notebooks), automatic submission to the training platform, storage of model and feature metadata in a database, and exposure of services via RPC or client APIs.

Key functional modules include:

Model File Management: Automated pipeline from training to DB registration, validation, upload to a Swift‑based file system, and cleanup of obsolete versions.

Model Computation: A unified Evaluator interface supports various model types (Java‑reimplemented for compatibility) and allows custom extensions.

Feature Engineering: Separate handling of feature transformation (extending Airbnb's Aerosolve with Java) and feature collection pipelines for both offline batch and online low‑latency scenarios.

Algorithm Service: Two deployment modes—proxy and RPC—provide flexibility in performance, resource usage, and monitoring.

Monitoring: Comprehensive metrics for service performance, feature quality, and data collection using InfluxDB, Chronograf, and Kapacitor, with alerts via QT and email.

The automated training platform enables engineers to submit notebook code to a Git repository, configure jobs through a unified UI, and trigger data‑driven training pipelines, eliminating manual scheduling and supporting nested model training.

As of the latest snapshot, the platform hosts 60 online applications, 4,522 model files (807 independent models), 170 GB of public features, and 260 GB of algorithm‑specific features, delivering roughly 450 QPS of algorithm services for the division.

monitoringmachine learningfeature engineeringautomationplatform architectureModel Management
Qunar Tech Salon
Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.