Construction and Application of Meituan Hotel & Travel Knowledge Graph
This article details Meituan's hotel‑travel knowledge graph, describing its background, scene‑recognition challenges, multi‑layer graph architecture, technical modules for knowledge mining, relation discrimination and supply tagging, and its practical applications in search, recommendation, ranking, and risk control, while also outlining future directions and a Q&A session.
Background and Motivation – In Meituan's hotel‑travel vertical search, user needs fall into two categories: precise store search and generic scenario search. Traditional text retrieval struggles with generic scenarios, prompting the construction of a large‑scale knowledge graph using reviews, logs, and other data sources.
Graph Structure – The graph consists of three layers: (1) Taxonomy layer defining label categories, (2) Label layer containing atomic and composite tags with synonym and hierarchical relations, and (3) Supply layer linking tags to POIs, rooms, tickets, and packages. Over 10 million nodes and 100 million edges have been built with ~95% core relation accuracy.
Technical Modules
Knowledge Mining – Extracts tags from structured, semi‑structured (e.g., Baidu Baike) and unstructured data (reviews, logs) using expanded vocabularies, short‑text classification, multi‑task learning, and deep + wide feature fusion.
Relation Discrimination – Identifies synonym and hypernym/hyponym relations via rule‑based similarity, vector indexing, active learning, and statistical co‑occurrence analysis.
Knowledge Association – Aligns mined tags with supply entities using evidence extraction from comments and product pages, followed by multi‑evidence fusion and tree‑model classification.
Applications
Search – Improves recall for composite tag queries, delivering results with relevant scenario tags.
Time‑Sensitive Tag Display – Shows strong‑relevance tags during active periods and only query‑related tags otherwise.
Ranking Lists – Generates popularity lists (e.g., hot springs) based on tag‑driven signals.
Suggestion (sug) – Enhances query suggestions with scenario‑aware tags, reducing irrelevant results.
Risk Control – Classifies tags into high, medium, and low risk, applying filtering and manual review to mitigate sensitive or misleading content.
Summary and Outlook – The graph is built across data, technology, application, and risk layers. Future work includes incorporating multimodal features, expanding graph usage to more filters, and improving risk‑mitigation algorithms to lower operational costs.
Q&A Highlights
Evaluation metrics: node/edge count, accuracy, and downstream coverage.
Tag system evolution: align with business needs and iterative improvement.
Scalability: only high‑frequency tags are indexed for online search.
Multimodal value: combining images with text enhances entity recognition (e.g., identifying roller coasters).
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.