Artificial Intelligence 9 min read

How Embedding-Based Recall Boosted Interaction by 33% in a Live Feed

This article details how Jike's recommendation team upgraded from Spark to TensorFlow, introduced a twin‑tower embedding model for recall, deployed it with TensorFlow Serving and Elasticsearch, and achieved a 33.75% lift in user interaction on the dynamic square.

Jike Tech Team

Jul 15, 2020

How Embedding-Based Recall Boosted Interaction by 33% in a Live Feed

Jike's dynamic square is a platform for discovering interesting circles and friends, and the recommendation team strives to show each user content they care about.

After previously sharing a technical article on using Spark MLlib for online ranking, the team upgraded from Spark+XGBoost to TensorFlow+DNN, and now explores deep learning in the recall layer.

The recall layer aims to filter millions of candidates down to a few hundred items based on user profiles and history. Because the candidate set is huge, items cannot be compared one‑by‑one; instead, various indexing methods are used to retrieve a batch of potentially relevant items.

Among these methods, vector‑embedding recall stands out for balancing precision and coverage. After training an embedding model, item vectors become index keys and user vectors become queries, enabling fast approximate nearest neighbor (ANN) search over massive pools.

Jike adopted an embedding‑based recall strategy for the dynamic square to better satisfy diverse user interests.

Model Structure

Many mature embedding models exist, from Matrix Factorization to item2vec, node2vec, YouTubeDNN, GCN, and GraphSAGE. Considering real‑time requirements, the team chose a simple DNN twin‑tower model inspired by DSSM, trained supervisedly to optimize click‑through rate.

The twin‑tower consists of a user tower and an item tower. Each tower starts with an embedding layer that converts raw features into vectors, concatenates them, and passes them through several fully‑connected layers, ending with a 64‑dimensional output embedding. The distance between user and item embeddings is computed, producing a scalar in [0,1] via a sigmoid and trained with cross‑entropy loss.

Model Deployment

After training, the model is deployed as an online service using TensorFlow Serving. Because user and item embeddings are computed separately, the twin‑tower is split into two lightweight models: one serving item features to produce item embeddings, the other serving user features to produce user embeddings.

Index Recall Architecture

The embedding recall is integrated into the existing Elasticsearch‑based pipeline. During dynamic indexing, MongoDB operation logs are consumed to update features in near real‑time. The indexing service fetches or computes features, calls TensorFlow Serving to obtain embeddings, and stores them in Elasticsearch.

During recall, the feature service computes real‑time user features, TensorFlow Serving generates the user embedding, and Elasticsearch performs a dense‑vector ANN query (using Alibaba Cloud's vector index plugin on ES 6.7) to retrieve the most relevant items. The P95 latency meets the recommendation service’s requirements.

Effect and Iteration Direction

The first version of the embedding‑based recall increased the overall interaction rate on Jike's dynamic square by 33.75%, the highest uplift among all recall strategies and also the largest in terms of traffic.

While this initial model demonstrates the potential of deep learning in recall, many challenges remain: automatic synchronization of embedding version updates, handling the semantic drift of vector dimensions, and improving the twin‑tower architecture (e.g., incorporating behavior sequences). Future work will explore richer model structures and training methods.

Jike continues to research cutting‑edge machine‑learning algorithms and build flexible deployment pipelines to help users discover more fun circles and interesting friends.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

deep learning recommendation system Elasticsearch Embedding vector recall TensorFlow Serving

Written by

Jike Tech Team

Article sharing by the Jike Tech Team

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.