Artificial Intelligence 12 min read

Applying XGBoost for Learning-to-Rank in Ctrip Search: Feature Engineering and Model Practice

This article details how Ctrip's search team leverages XGBoost for learning-to-rank, covering L2R concepts, feature engineering, data preparation, model training, hyper‑parameter tuning, offline evaluation, and deployment insights for large‑scale search ranking systems.

Ctrip Technology

Dec 12, 2019

Applying XGBoost for Learning-to-Rank in Ctrip Search: Feature Engineering and Model Practice

Author Introduction Cao Cheng, senior R&D engineer in Ctrip Search, focuses on personalized recommendation and search ranking.

1. Introduction With the rapid growth of the internet, traditional rule‑based or manually tuned ranking can no longer meet the diverse search scenarios, prompting the adoption of machine learning and deep learning. Ctrip's main‑site search faces challenges such as varied search methods, scenarios, business lines, user intents, and user types, motivating an exploration of traditional machine learning and deep learning solutions, specifically XGBoost for ranking.

2. XGBoost Exploration and Practice Learning to Rank (L2R) is a supervised learning process that requires feature selection, training data collection, and model training. L2R can be categorized as PointWise, PairWise, and ListWise. The article introduces the XGBoost workflow and shows an illustration of its application in search ranking.

XGBoost is an optimized distributed gradient‑boosting library that builds tree ensembles, but it has limited support for high‑dimensional sparse matrices and can be cumbersome to tune.

3. Feature Engineering Practice Effective feature engineering is crucial for traditional machine learning. The overall workflow includes data source identification, feature extraction, labeling rules, and data collection (online logging or offline extraction). Key preprocessing steps involve handling missing values, feature normalization, noise removal, continuous‑value analysis, and correlation analysis, illustrated by the following figures.

4. Model Engineering Practice

4.1 Evaluation Metric Definition Two primary business metrics are considered: click‑through rate (whether a user clicks) and first‑screen exposure click rate. The XGBoost objective is set to "rank:pairwise" and the evaluation metric used is NDCG (Normalized Discounted Cumulative Gain).

def _to_list(x):
    if isinstance(x, list):
        return x
    return [x]


def ndcg(y_true, y_pred, k=20, rel_threshold=0):
    if k <= 0:
        return 0
    y_true = _to_list(np.squeeze(y_true).tolist())
    y_pred = _to_list(np.squeeze(y_pred).tolist())
    c = list(zip(y_true, y_pred))
    random.shuffle(c)
    c_g = sorted(c, key=lambda x: x[0], reverse=True)
    c_p = sorted(c, key=lambda x: x[1], reverse=True)
    idcg = 0
    ndcg = 0
    for i, (g, p) in enumerate(c_g):
        if i >= k:
            break
        if g > rel_threshold:
            idcg += (math.pow(2, g) - 1) / math.log(2 + i)
    for i, (g, p) in enumerate(c_p):
        if i >= k:
            break
        if g > rel_threshold:
            ndcg += (math.pow(2, g) - 1) / math.log(2 + i)
    if idcg == 0:
        return 0
    else:
        return ndcg / idcg

4.2 Initial Model Training Using the prepared data, the initial XGBoost model is trained to obtain baseline parameters and feature importance.

train_dmatrix = DMatrix(x_train, y_train)
valid_dmatrix = DMatrix(x_valid, y_valid)
test_dmatrix = DMatrix(x_test)

train_dmatrix.set_group(group_train)
valid_dmatrix.set_group(group_valid)
test_dmatrix.set_group(group_test)

params = {'objective': 'rank:pairwise', 'eta': 0.5, 'gamma': 1.0,
          'min_child_weight': 0.5, 'max_depth': 8, 'eval_metric':'ndcg@10-', 'nthread':16}

xgb_model = xgb.train(params, train_dmatrix, num_boost_round=1000,
                      evals=[(valid_dmatrix, 'validation')])

import pandas as pd
print('特征名称', '特征权重值')
feature_score = xgb_model.get_fscore()
pd.Series(feature_score).sort_values(ascending=False)

4.3 Model Tuning Five‑Step Process The tuning steps include: (1) selecting an initial parameter set, (2) adjusting max_depth and min_child_weight, (3) tuning gamma to reduce over‑fitting, (4) modifying subsample and colsample_bytree for data sampling, and (5) changing the learning rate eta.

Grid‑search results show that increasing max_depth improves scores, while a min_child_weight of around 6 yields better performance.

4.4 Offline Model Evaluation After training and tuning, offline evaluation is performed by pulling production request logs, simulating real traffic, and comparing predicted rankings against actual click‑through and first‑screen exposure metrics.

4.5 Model Prediction Online A/B experiments are conducted to monitor the model's real‑time performance, facilitating iterative improvements.

5. Summary and Outlook Key takeaways: (1) Thorough requirement analysis is essential before applying machine‑learning algorithms; (2) Robust feature engineering and accurate labeling are critical for model success; (3) Leveraging open‑source tools and visual analytics accelerates decision‑making; (4) Single algorithms rarely satisfy complex ranking needs, so multi‑model fusion and deep‑learning exploration are necessary.

Ctrip Technology

Official Ctrip Technology account, sharing and discussing growth.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.