Hotel Search Relevance Modeling and Architecture at Fliggy (Alibaba)
This article presents a comprehensive overview of Fliggy's hotel search relevance system, covering the business background, multi‑scenario architecture, core factor estimation, entity recognition, text and spatial relevance modeling, multi‑scenario fusion, and future optimization directions.
As travel demand grows, hotel reservation becomes a crucial part of the user experience; Fliggy aims to provide personalized hotel search by building a sophisticated relevance system that quickly matches users with suitable hotels.
The hotel search operates across multiple platforms (Fliggy app, Taobao, Alipay) and scenarios, handling diverse query conditions such as text, price, star rating, and location. User behavior is sparse and decision cycles are long, making relevance modeling challenging.
The search pipeline consists of a Service Proxy (SP) that routes user requests, a Query Processor (QP) for parsing, an initial ranking stage, and a TPP service for final scoring. Both offline features (hotel name, location) and real‑time signals (inventory, recent transactions, dynamic pricing) are combined to produce the final ranking.
Relevance modeling is divided into three parts: text relevance (using BM25, Jaccard, and a lightweight transformer network to generate similarity vectors), spatial relevance (leveraging distance, geohash binary encoding, and geohash‑based token lists to capture geographic proximity), and multi‑scenario relevance (scenario‑specific MLPs that fuse generic and scenario‑focused features). Core factor estimation predicts price and distance preferences, while entity recognition identifies POIs and hotel names, handling disambiguation across cities.
Feature engineering includes estimating core factors (price and distance) with distribution‑aware adjustments, recognizing core entities with BERT‑CRF models, and converting geohash codes into binary sequences to preserve spatial relationships for modeling.
Future work focuses on improving spatial‑price estimation, introducing two‑dimensional distance modeling, adjusting price predictions based on city tier consumption levels, upgrading spatial‑text relevance models with more complex online algorithms, and incorporating historical search sequences for contextual relevance.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.