Evolution of Recall Models in Recommendation Systems: From Collaborative Filtering to Deep Learning and Tree‑Based Retrieval
This article surveys the development of recall modules in large‑scale recommendation systems, covering traditional item‑based collaborative filtering, single‑embedding DNN and dual‑tower approaches, multi‑interest capsule networks, graph‑based embeddings, long‑short term interest modeling, and the tree‑structured TDM framework for efficient deep matching.
Recommendation systems consist of three core stages—recall, ranking, and post‑ranking mechanisms. The recall stage filters billions of candidate items into a manageable set based on user and item features, and its design has evolved dramatically with the rise of deep learning.
Traditional methods rely on item‑based collaborative filtering, where similarity is computed from co‑click frequencies. While simple and fast, these methods are limited to the user's historical item space and cannot incorporate side information such as brand or category, leading to narrow and low‑diversity results.
Single‑embedding recall models map users and items into a shared low‑dimensional space using deep neural networks (DNN). The classic YouTube DNN approach concatenates user video‑watch embeddings, search‑term embeddings, and profile features, trains a three‑layer ReLU network with a softmax output, and uses the penultimate ReLU activations as user embeddings for K‑nearest‑neighbor (KNN) retrieval.
Dual‑tower architectures extend this idea by learning separate DNNs for users and items, then computing the inner product of the two output vectors. This structure, popularized by recent YouTube papers, enables efficient large‑scale retrieval via libraries such as Faiss.
To capture diverse user interests, multi‑embedding methods generate several user vectors. The Multi‑Interest Network with Dynamic Routing (MIND) introduces a capsule‑network‑style routing layer that outputs multiple interest embeddings, which are weighted by a label‑aware attention mechanism before matching with item embeddings.
Graph‑based embeddings enrich item representations with side information. A typical pipeline builds a graph from item co‑occurrence, performs random walks, and trains a word2vec model; side‑information embeddings are then aggregated with learned weights. GraphSAGE further enables inductive learning by aggregating neighbor features, allowing new items to obtain embeddings without retraining the entire graph.
Modeling both long‑term and short‑term user interests improves recall quality. The Sequential Deep Matching (SDM) model uses a GRU‑based gate to fuse a short‑term interest vector (derived from recent interactions via self‑attention) with a long‑term interest vector (learned from historical behavior). The combined representation is matched against item embeddings using the same scoring function as the YouTube DNN.
Self‑attention based next‑item recommendation treats the recent L items as a sequence, encodes them with a transformer block, and computes a short‑term interest vector. Long‑term interest is represented as a static user embedding; the final score combines both via a squared distance metric.
The Tree‑based Deep Matching (TDM) framework indexes the entire item catalog in a hierarchical tree, trains a deep model to predict node‑level click‑through rates, and performs beam‑search retrieval by selecting top‑K nodes at each level. Interest modeling and tree structure are jointly optimized through alternating training of the deep model and label correction, yielding an efficient O(log N) recall mechanism.
Overall, the surveyed techniques illustrate a clear trend: moving from shallow, similarity‑based recall toward deep, representation‑rich models that can handle multi‑interest, side‑information, and scalable tree‑based retrieval, thereby addressing the challenges of diversity, cold‑start, and latency in modern large‑scale recommender systems.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.