Artificial Intelligence 9 min read

Divide‑and‑Conquer Embedding‑Based Retrieval with Prompt‑Based Multi‑Task Learning for Large‑Scale Recommendation

This paper identifies the trade‑off between simple and hard negatives in embedding‑based retrieval for recommendation, proposes a clustering‑based divide‑and‑conquer framework combined with prompt‑driven multi‑task learning to improve relevance, diversity, and fairness, and validates the approach through offline metrics, online A/B tests, and comparative experiments.

Kuaishou Tech

Apr 24, 2023

Divide‑and‑Conquer Embedding‑Based Retrieval with Prompt‑Based Multi‑Task Learning for Large‑Scale Recommendation

Embedding‑based retrieval (EBR) is widely used in recommendation systems due to its simplicity and effectiveness, but it faces two major challenges: (1) a trade‑off between distinguishing simple negatives and hard negatives during recall over the full candidate set, and (2) limited controllability of diversity and fairness because approximate nearest‑neighbor (ANN) search greedily returns the highest‑scoring items.

To address these issues, the authors introduce a divide‑and‑conquer solution that first clusters the entire candidate pool into K semantically related groups, builds independent ANN indexes for each cluster, and performs EBR within each cluster. This allows each EBR model to focus on separating positives from hard negatives, naturally framing the problem as K related retrieval tasks.

Inspired by prompt tuning, a prompt‑based task adaptation method is proposed to learn a unified multi‑task EBR model. Each task (cluster) receives a learnable prompt embedding that is combined with user behavior features (via Hadamard product) before feeding into a shared Transformer encoder (e.g., SASRec). This approach leverages recent advances in multi‑task learning for text generation while keeping the model architecture unchanged.

Experiments include offline evaluations comparing the proposed method against a simple hard‑negative mixing baseline (SASRec+), multi‑vector recall methods (MIND, ComiRec), and various multi‑task learning techniques (MMoE, PLE). The prompt‑based approach achieves superior recall and ranking metrics with negligible additional training cost, while MMoE reduces throughput by ~70%.

Online A/B tests on the Kuaishou platform demonstrate significant gains in user engagement metrics such as app usage duration, likes, and follows, confirming the practical effectiveness of the proposed framework.

The work concludes that the divide‑and‑conquer retrieval architecture together with prompt‑driven multi‑task learning provides a new direction for optimizing EBR in large‑scale recommendation, and future work will explore more systematic cluster/task partitioning and advanced multi‑task strategies.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

multi-task learning Prompt Tuning approximate nearest neighbor divide and conquer Embedding Retrieval

Written by

Kuaishou Tech

Official Kuaishou tech account, providing real-time updates on the latest Kuaishou technology practices.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.