Algorithm Optimization for Hotel Recommendation and Large‑Scale Discrete DNN Training at Ctrip
This article describes how Ctrip improved hotel recommendation by iterating from logistic regression to GBDT and deep neural networks, designing continuous and discrete features, adopting multi‑task learning with click and conversion signals, and building a large‑scale distributed DNN training and unified feature‑processing framework to boost model accuracy and engineering efficiency.
When users browse hotels on Ctrip, the platform must select appropriate hotel recommendations to reduce user effort; three typical scenarios are welcome ranking, intelligent ranking, and search‑compensation ranking, as illustrated in Figure 1.
The recommendation features are divided into user‑side, item‑side, and interaction features, further classified as continuous or discrete. Continuous features have good generalization but weak memorization, while discrete features provide strong memorization and high discrimination at the cost of weaker generalization.
The model pipeline evolved from Logistic Regression (LR) to Gradient Boosting Decision Tree (GBDT) and finally to Deep Neural Network (DNN). LR offers linear decision boundaries, interpretability, and incremental updates but limited accuracy; GBDT provides non‑linear boundaries and higher precision but cannot handle massive discrete features; DNN delivers highly non‑linear boundaries, supports large‑scale discrete embeddings, and achieves the highest accuracy, though it is less interpretable and more engineering‑intensive. A comparative table summarizes these characteristics.
In the hotel recommendation task, both click and conversion (CR) signals are modeled. Multi‑task learning is implemented via an ESMM‑style architecture: (1) discrete features are hashed into unique signatures and embedded; (2) pooling (sum for multi‑value features) and concatenation combine single‑value and multi‑value embeddings; (3) a multilayer perceptron predicts click and conversion using cross‑entropy loss, yielding significant gains over pairwise GBDT models.
To support these advances, Ctrip built a large‑scale discrete DNN training framework and a unified feature‑processing framework. The training platform, based on TensorFlow, supports any model architecture and any scale through an asynchronous parameter‑server backend, default hyper‑parameters, and horizontal scalability. The feature framework uses a protobuf schema shared by online and offline pipelines, ensuring identical input data and operator logic, thus eliminating online‑offline inconsistencies.
The combined algorithmic and engineering efforts have delivered noticeable improvements in hotel ranking performance, while future work will focus on model interpretability and deeper algorithmic exploration.
Ctrip Technology
Official Ctrip Technology account, sharing and discussing growth.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.