Artificial Intelligence 19 min read

Optimizing CVR in Sparse High‑Value Travel Recommendation Scenarios

This article presents a comprehensive overview of conversion‑rate (CVR) optimization for Alitrip’s travel recommendation platform, detailing the challenges of extremely sparse user feedback, the design of item, user, query and context features, and a series of model‑level and loss‑function techniques—including generic‑label modeling, global‑transaction modeling, ESMM, rank‑loss approximations, and multi‑task CTR auxiliary training—to improve both CTR and CVR performance in high‑ticket‑price scenarios.

DataFunSummit
DataFunSummit
DataFunSummit
Optimizing CVR in Sparse High‑Value Travel Recommendation Scenarios

In the travel recommendation domain, user feedback is highly sparse and each transaction carries a high monetary value, making even small CVR improvements critical for gross merchandise volume (GMV) growth.

Background : Alitrip’s recommendation flow consists of three main scenes—Home "Guess You Like", Main Search, and Small Search—each with distinct user intents. Click‑to‑exposure ratios are roughly 1:50, while purchase‑to‑click ratios drop to about 1.4:100, highlighting the severe sparsity problem.

Feature Design :

Item features: static attributes (price, destination, category) and efficiency features (CTR/CTVR trends, rank‑based metrics across multiple dimensions).

User features: static demographics and real‑time preferences derived from recent behavior (e.g., next‑day destination preference, real‑time location, click/ purchase sequences).

Query features: provided by an NLP team, including query type identification, vector embeddings, and normalization.

Context features: recall‑side signals, coarse‑ranking scores, user‑item distance, and other exposure‑related attributes.

All statistical features are computed using yesterday’s data to avoid feature‑leakage during online inference.

Modeling Sparse CVR :

Two technical routes are considered: (1) directly applying a CTR model to CVR data, and (2) designing a dedicated CVR model to address sample‑selection bias (SSB), sparsity, and delayed feedback.

The industry‑standard ESMM shares embeddings between CTR and CTCVR to alleviate sparsity, while also modeling the CTR‑CTCVR relationship.

To further mitigate positive‑sample scarcity, generic‑label modeling introduces two auxiliary labels— label_dest and label_cate —derived from purchase behavior across destinations and categories. These labels are learned jointly with the main CVR task, sharing parameters and enriching the feature space.

A "global transaction" approach adds items that convert in other scenes (but not the current one) as additional positive samples, and unexposed, non‑purchased items as negatives, thereby expanding the effective training set.

Probability formulas for global‑transaction modeling are combined to estimate the final exposure‑to‑conversion probability for the target scene.

Loss Function Enhancements :

Standard cross‑entropy loss is effective for most tasks, but the team explored AUC‑direct optimization. Because AUC is non‑differentiable, a differentiable "rank loss" approximation was introduced.

Training with rank loss alone proved unstable, so a hybrid loss (cross‑entropy + rank loss) was adopted, either by summation or multiplication, yielding comparable results.

When CVR improvements plateaued, additional CTR auxiliary tasks (query‑item, user‑item, context‑item) were added, boosting click‑through rates without harming CTCVR performance.

Practical Tricks :

Dynamic routing selects a branch containing at least one positive sample within a batch, preventing the model from being dominated by negatives.

Embedding‑based handling of statistical features (e.g., discretization followed by embedding) and retaining only top‑N ranking features simplifies model input.

Feature‑cross‑period normalization (e.g., 7‑day, 15‑day windows) reduces volatility of rank‑based features.

Q&A Highlights :

Item efficiency features are crucial and do not exacerbate the "Matthew effect" when properly handled.

Ranking features need no explicit normalization because they are compared within the same user cohort.

Real‑time features may diverge between offline and online pipelines due to missing or delayed event logging.

Generic‑label routing is effective when batch size is insufficient to guarantee a positive sample.

In summary, feature engineering accounts for the majority of performance gains in sparse CVR scenarios; a well‑designed MLP baseline, auxiliary labels, and carefully crafted loss functions together deliver robust online improvements.

e-commercemachine learningfeature engineeringRecommendation systemssparse dataCVR optimization
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.