Exploring Search Matching Models and Their Applications in DiDi Food
This article introduces the background of search relevance, reviews three common matching model types—representation‑based, interaction‑based, and hybrid—describes their architectures such as DSSM, CDSSM, DRMM and DUET, and presents experimental results of these models on DiDi Food’s search system.
Search relevance is essentially a matching problem where a user submits a query and the system returns relevant items; in DiDi Food this translates to matching user queries with restaurants and dishes.
The article distinguishes matching (recall) from ranking (ordering) and explains that DiDi Food splits its pipeline into coarse and fine ranking stages.
Matching model categories include:
Representation‑based deep matching models
Interaction‑based deep matching models
Hybrid models that combine both representation and interaction
Representation‑based models such as DSSM compute semantic vectors for query and document separately and then compare them (e.g., cosine similarity). The input layer uses word hashing with letter‑trigrams for English and character‑level tokens for Chinese, reducing dimensionality and improving generalization.
The DSSM architecture consists of an input layer, a representation layer (bag‑of‑words), multiple hidden layers with tanh activation, and an output layer producing a 128‑dimensional semantic vector.
Interaction‑based models like DRMM build a local interaction matrix between query and document terms, generate matching histograms, and feed them into a deep neural network. The model uses term‑level gating networks and can incorporate term frequency (TF) or inverse document frequency (IDF) features.
Hybrid models (e.g., DUET) combine the scores from a local interaction component and a distributed representation component, leveraging the strengths of both approaches.
The article also details the processing steps for both local and distributed parts, including convolution, max‑pooling, and element‑wise product operations, and discusses the use of trigram‑based embeddings for characters.
Model effect analysis presents offline evaluation metrics on public datasets and reports experimental results on DiDi Food data (Guadalajara city, February) for four models: DSSM, CDSSM, DRMM, and DUET.
Overall, the article provides a comprehensive overview of search matching techniques, their implementations, and performance in a real‑world food delivery scenario.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.