Embedding‑Based Item‑to‑Item Recommendation for Homestay Platforms
This article describes how Tujia applied embedding techniques, particularly a Skip‑Gram model, to build an item‑to‑item similarity recommender for low‑frequency, highly personalized homestay listings, detailing the data preparation, model architecture, training process, evaluation results, practical improvements, and future directions.
Homestay rentals have become a fast‑growing tourism segment, but traditional collaborative‑filtering and content‑based methods perform poorly due to low purchase frequency and difficulty in describing user interests. Tujia therefore explored embedding‑based item‑to‑item similarity recommendation to capture multi‑dimensional property features.
The solution leverages large‑scale user click logs, treating consecutive house views within a short time window as a contextual sequence analogous to word contexts in language models. A Skip‑Gram model is trained to map each house to a low‑dimensional vector (e.g., 64‑D), where the inner product reflects similarity.
Model architecture includes an embedding matrix E (n × d) for input houses and a weight matrix W (n × d) for output predictions, with negative sampling to differentiate clicked (positive) from skipped (negative) houses. Training uses 1 M houses, 8 M click contexts, 40 M training samples, batch size 1024, 100 epochs, L2 regularization on W , and runs on a Tesla M40 GPU.
Evaluation combines loss curves (training vs. validation) and product‑level metrics such as conversion rate uplift in online A/B tests, showing a clear improvement after deployment. Additional offline analyses compare single‑dimension similarity and demonstrate that embedding captures richer relationships than pure geographic similarity.
Practical improvements include filtering short‑duration clicks, limiting context length, weighting ordered clicks, and careful negative sampling. Post‑processing normalizes vectors and filters rarely seen houses. Cold‑start is addressed by averaging vectors of a small, similar house set.
To keep the model up‑to‑date with rapid inventory growth, Tujia adopts a two‑stage update: pre‑load parameters from the previous model, append randomly initialized vectors for new houses, and fine‑tune on recent two‑month logs, preserving useful historical embeddings while incorporating fresh data.
Future work plans to enrich behavior signals (favorites, chats, reviews), introduce attention mechanisms for temporal ordering, and jointly learn auxiliary property embeddings alongside the main house vectors.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.