Semantic Search Recall Techniques at JD: Dual‑Tower Model, Graph Model, Synonym Recall, and Index Joint Training
The talk presents JD's end‑to‑end semantic search recall pipeline, covering multi‑stage retrieval, a dual‑tower embedding model with multi‑head attention, a heterogeneous graph neural network for low‑frequency items, automatic synonym generation via transformer models, and a joint training approach that integrates product quantization directly into the model to improve accuracy and efficiency.
JD's search system consists of four stages—recall, coarse ranking, fine ranking, and re‑ranking—where recall results come from both traditional inverted‑index retrieval and semantic recall. Conventional methods such as manual synonym tables are costly and have low coverage, prompting a shift toward deep‑learning‑based semantic techniques.
The dual‑tower semantic recall model embeds queries and items into a shared low‑dimensional space. Separate query and item towers process textual n‑grams and item attributes (title, brand, category, delivery method). Embedding matrices are shared, and relevance is measured by vector distance. The model is served as a unified service that outputs query embeddings and retrieves items from a pre‑built index in a single request. To handle ambiguous queries (e.g., "apple"), a multi‑head architecture with attention allows the query side to learn multiple representations, improving multi‑sense recall.
A heterogeneous graph neural network (SearchGCN) augments the semantic model by constructing a click graph that includes queries, items, brands, and shops. Two‑side aggregation gathers first‑order (clicked items) and second‑order (item attributes) information for both query‑centered and item‑centered subgraphs. Attention‑based message passing and sum‑aggregation produce enriched node embeddings, which lead to more accurate recall, especially for low‑frequency items and short queries.
For synonym recall, JD builds an automatic synonym generation pipeline using a two‑stage transformer: a forward model generates candidate titles from a query, and a backward model generates refined queries from those titles. A joint training variant adds a direct query‑to‑query generation loss, improving relevance. During inference, the model samples diverse candidates to increase recall diversity.
The index‑joint training model addresses the precision loss of product quantization (PQ) in approximate nearest‑neighbor search. PQ operations (rotation, coarse quantization, sub‑space quantization, and inverse rotation) are parameterized and learned inside the model, reducing quantization error. This eliminates the need for a separate index‑building step, allowing the model and its index to be exported together, which simplifies deployment and speeds up serving.
Extensive experiments on JD's private dataset, MovieLens, and Amazon show consistent gains in precision@100 and recall@100. Visualizations (t‑SNE) demonstrate tighter clustering of same‑category items after applying the graph and PQ‑enhanced models. The joint training approach also brings engineering benefits such as reduced latency and simplified service architecture. The code has been open‑sourced as a Python package.
The session concluded with a Q&A covering orthogonal matrix initialization, loss functions for synonym models, the relationship between JD retail and its apps, and implementation details of the multi‑head projection matrices.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.