Artificial Intelligence 11 min read

Design and Implementation of the 58 Car Price Estimation System Using Machine Learning

The article describes the end‑to‑end architecture, data collection, preprocessing, feature engineering, model selection, training, and hyper‑parameter tuning of 58’s car price estimation platform, which leverages Spark, XGBoost, LightGBM and custom business rules to predict vehicle resale values.

58 Tech

Nov 11, 2019

Design and Implementation of the 58 Car Price Estimation System Using Machine Learning

The background explains the rapid growth of China’s used‑car market and the need for an accurate online valuation system, leading to the development of 58 估车价, a proprietary model that predicts a car’s residual value (0‑1) for tasks such as post‑audit, price transparency, and ranking.

Overall Architecture – The system consists of four stages: (a) data ingestion from 58’s internal transaction and audit records plus third‑party sources, mapped to a unified vehicle catalog and stored in HDFS; (b) Spark‑based data completion and denoising to produce training‑ready datasets; (c) feature processing and iterative model training with extensive validation of features, hyper‑parameters, and model accuracy; (d) deployment of the final model via a custom RPC framework combined with business rules to serve stable valuation services.

Data Processing – Large, noisy samples are cleaned through rule‑based filtering (e.g., registration vs. launch dates), statistical outlier removal using box‑plots, duplicate elimination, cost‑based sorting, and smoothing filters applied to the residual‑rate series. Feature engineering includes extracting vehicle configuration, mileage, age, launch time, and categorical attributes (country, class, powertrain) and transforming them (e.g., log‑mileage, adjusted age) to capture non‑linear effects.

Model Training – The valuation problem is treated as a regression task. After evaluating several algorithms, tree‑based gradient boosting models (XGBoost and LightGBM) were selected for their ability to handle missing values, regularization, and efficient parallel computation. LightGBM was preferred for lower memory usage and faster training due to histogram‑based decision trees.

Hyper‑Parameter Tuning – Key parameters tuned include learning_rate, max_depth / num_leaves, reg_alpha, reg_lambda, and n_estimators. GridSearchCV was used for systematic exploration, balancing model complexity against over‑fitting and inference latency. Example tuning code:

lgb_big_params = {}</code>
<code>lgb_big_params['learning_rate'] = 0.3</code>
<code>lgb_big_params['max_depth'] = 11</code>
<code>lgb_big_params['n_estimators'] = 300</code>
<code>lgb_big_params['boosting_type'] = 'gbdt'</code>
<code>lgb_big_params['reg_lambda'] = 10.</code>
<code>lgb_big_params['reg_alpha'] = 1.</code>
<code>lgb_big_params['colsample_bytree'] = 0.8</code>
<code>lgb_big_params['num_leaves'] = 255</code>
<code>lgb_big_params['max_bin'] = 127</code>
<code>lgb_big_params['min_child_samples'] = 20</code>
<code>lgb_model = LGBMRegressor(**lgb_big_params)

Summary and Future Work – The current pipeline handles most cases, but rare vehicle segments still suffer from limited data. Ongoing efforts include model ensembling, segment‑specific models (e.g., sedan vs. EV), age‑stage models, and price‑range models, as well as expanding sample size and feature dimensions. The team also plans to explore deep learning for semantic and image‑based valuation to further improve accuracy.

Authors – Guān Péng, senior R&D engineer at 58.com, and Shǎng Yǔ, senior development engineer, both responsible for the 58 估车价 project and related deep‑learning applications.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Machine Learning feature engineering data preprocessing XGBoost LightGBM car price estimation

Written by

58 Tech

Official tech channel of 58, a platform for tech innovation, sharing, and communication.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.