Artificial Intelligence 17 min read

Query Understanding in Meituan Search: Entity Recognition, Query Rewriting, and Intent Detection

This article presents a comprehensive overview of Meituan's query understanding system, detailing how entity recognition, entity linking, query rewriting, and intent detection are designed and deployed to improve O2O search relevance across diverse local services.

DataFunTalk
DataFunTalk
DataFunTalk
Query Understanding in Meituan Search: Entity Recognition, Query Rewriting, and Intent Detection

Over the past two decades, search has evolved from simple text matching to sophisticated semantic understanding that incorporates context, location, time, and user behavior. This article introduces the construction and key components of Meituan's query understanding (QU) system within its O2O search platform.

1. Meituan Search Overview

Meituan's search serves as the entry point for a wide range of local services—food delivery, hotel booking, travel, and more—requiring high‑precision, location‑aware retrieval. Unlike traditional web or e‑commerce search, Meituan must handle heterogeneous supply constraints ranging from a few‑kilometer delivery radius to nationwide hotel listings.

2. Query Understanding (QU)

QU extracts a series of signals from the user's query, which feed into recall and ranking stages. For example, the query "杭州东附近的7天酒店" is expanded to "杭州东站" to broaden recall.

3. Distributed QU (DQU) Architecture

The DQU framework consists of three layers: a foundational signal layer (NLP modules), a business‑logic layer that adapts to different verticals, and an adaptation layer that provides unified interfaces for various services.

4. Entity Recognition & Linking

Entity recognition (NER) identifies structured components such as business names and locations, enabling precise, field‑level recall. The system combines dictionary matching, model prediction, and disambiguation rules to output multiple candidate entities.

Model iteration moved from CRF to BERT, leveraging Meituan UGC for pre‑training and semi‑automatically generated data from user behavior for fine‑tuning. Automatic data labeling uses a pipeline where a base model predicts on large‑scale entity libraries, differences are corrected, and the corrected data are merged for further training.

5. Entity Linking

After recognizing address entities, linking maps them to geographic coordinates, supporting radius‑based recall. The offline component mines mentions from merchant databases, click logs, and knowledge graphs, while the online component performs alias‑based recall followed by context‑aware disambiguation.

6. Query Rewriting

To handle linguistic variability, Meituan employs a statistical machine translation (SMT) pipeline. Parallel corpora are constructed from session logs and query‑doc clicks, enabling sub‑sentence level rewrites. The process includes sentence‑pair mining, N‑gram based translation model training, and beam‑search decoding with business‑specific filters to avoid combinatorial explosion.

7. Intent Detection

The intent module classifies queries into business intents (e.g., food, hotel, travel) and contextual intents (time, location, event). A hybrid approach—dictionary matching, rule‑based patterns, and BERT models—covers both head and tail queries. Intent scores are further refined by a ranking model that combines statistical features (CTR, CVR) and embedding features.

8. Conclusion & Outlook

The article summarizes Meituan's query understanding pipeline—entity recognition, query rewriting, and intent detection—and highlights future directions such as incorporating richer spatio‑temporal and user‑context signals to deepen semantic understanding in O2O search.

Search EngineNLPquery understandingEntity RecognitionMeituanquery rewritingintent detection
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.