Artificial Intelligence 25 min read

Generative Recommender Systems for JD Affiliate Advertising: Architecture, Methods, and Experimental Evaluation

This article surveys how large language models can reshape recommendation systems, describes the four-stage generative pipeline, details item representation techniques such as semantic IDs, presents a JD affiliate advertising use case with offline and online experiments, and outlines future optimization directions.

JD Tech Talk

Jun 13, 2024

Generative Recommender Systems for JD Affiliate Advertising: Architecture, Methods, and Experimental Evaluation

Large language models (LLMs) are profoundly influencing the natural language processing (NLP) field, and their powerful capabilities open new research avenues for other domains. Recommendation systems (RS) mitigate information overload and are deeply embedded in daily life; leveraging LLMs to redesign RS is a promising research problem.

1. Background

Generative Recommender Systems

A generative recommender system directly generates recommendations or recommendation‑related content without calculating each candidate’s ranking score one by one [25].

Traditional RS handle massive item pools through multi‑stage filtering (recall, coarse ranking, fine ranking, re‑ranking). Simple rules reduce candidates from tens of millions to a few hundred, after which complex algorithms select the final items. Because of latency constraints, sophisticated algorithms cannot be applied to the entire item set.

LLMs can transform this paradigm. Compared with traditional RS, generative RS offer two main advantages: (1) Simplified recommendation flow – LLMs generate the target item directly, turning a multi‑stage discriminative pipeline into a single‑stage generative one; (2) Better generalization and stability – world knowledge and reasoning in LLMs improve cold‑start performance and cross‑domain transfer, while reducing feature‑engineering effort and model‑update frequency.

Figure 1. Comparison of traditional and LLM‑based generative recommendation pipelines [25]

JD Affiliate Advertising

JD Affiliate is a marketing platform that places off‑site CPS ads. Partners share generated links to promote JD products; clicks that lead to purchases generate commissions. The platform aims to acquire new users and increase activity.

The affiliate‑ad recommendation scenario focuses on low‑activity users and faces four challenges: data sparsity, cold‑start, difficulty of scene understanding, and maintaining diversity/novelty.

LLM‑Enhanced Generative Recommendation for JD Affiliate

LLMs extract high‑quality textual representations and embed world knowledge, enabling better user‑item understanding. By leveraging context, LLM‑based generative RS improve accuracy, mitigate sparsity and cold‑start issues, and generalize to unseen items and scenarios.

2. Four Stages of a Generative Recommender System

The pipeline consists of:

Item Representation – items are encoded as short textual identifiers (e.g., semantic IDs) rather than raw descriptions.

Model Input Representation – a prompt combines task description, user information, and contextual/external data.

Model Training – the model learns to predict the next item identifier given the input (Next Token Prediction).

Model Inference – the trained model generates item identifiers, which are then mapped back to real items.

Although the outline appears simple, each stage involves many design choices, which are reviewed below.

Item Representation

An identifier in recommender systems is a sequence of tokens that can uniquely identify an entity such as a user or an item. It may be an embedding, a numeric token sequence, or a word token sequence (e.g., title, description) [25].

Items often contain multimodal signals (images, audio, titles). A good identifier must (1) be short enough for efficient generation and (2) embed prior knowledge so that similar items share many tokens while dissimilar items share few.

Three main families:

Numeric ID – traditional integer IDs are split into token sequences; easy to store but lack semantics, leading to cold‑start issues.

Textual Metadata – use titles, product names, etc.; benefits from LLM world knowledge but can be long and ambiguous.

Semantic‑based ID (SID) – discretize item vectors (e.g., via RQ‑VAE) into token sequences that capture semantic similarity.

Table 1. Comparison of item representation methods

Model Input Representation

Input consists of three parts: task description, user information, and context/external knowledge.

Task Description – a prompt that frames recommendation as a Next Item Prediction task.

User Interaction History – represented as sequences of item IDs, textual metadata, or hybrid ID+vector tokens.

User Profile – demographic or preference information, optionally combined with textual description.

Context & External Information – location, time, scene, or knowledge graph signals that influence decisions.

Model Training

Training involves two steps: constructing textual data (input‑output pairs) and optimizing the generative model to maximize conditional likelihood.

The primary task is user‑to‑item‑identifier prediction . For SID‑based methods, auxiliary tasks such as item‑text ↔ SID alignment and user‑to‑item‑text prediction are added to bridge the gap between semantic IDs and natural language.

Model Inference

During inference, the model generates an identifier token sequence via beam search. Two generation modes exist:

Free Generation – unrestricted token sampling, which may produce invalid identifiers.

Constrained Generation – use a Trie or FM‑index to restrict output to valid identifiers.

Post‑generation, the identifier is matched to a real item (e.g., via L2 distance between generated token embeddings and stored item embeddings).

3. Practical Solution for JD Affiliate

Overall Design

Our solution focuses on (1) SID‑based item representation and (2) joint training of collaborative and textual signals.

Figure 2. System architecture

Item Representation via SID

Item text (product title) is encoded with bert-base-chinese (768‑dim) and Yi‑6B (4096‑dim). The vectors are quantized by an RQ‑VAE model into SID token sequences such as <a_99><b_225><c_67><d_242> or with an extra random token for uniqueness.

Alignment of Collaborative and Textual Signals

We train on two tasks:

Next Item Prediction – given user profile + interaction history, predict the next SID.

Additional Alignment – bidirectional SID↔title tasks to align semantic IDs with textual descriptions.

4. Offline and Online Experiments

Training Data

Examples of the JSON‑style instruction‑response format used for fine‑tuning:

{
    "instruction": "该用户性别为女，年龄为46-55岁，婚姻状况为已婚，有无子女状况为未知。用户已按时间顺序点击了如下商品：<a_112><b_238><c_33><d_113>, <a_73><b_50><c_228><d_128>, ... 你能预测用户下一个可能点击的商品吗？",
    "response": "<a_96><b_113><c_49><d_174>"
}

{
    "instruction": "商品<a_99><b_225><c_67><d_242>的标题是什么？",
    "response": "ThinkPad 联想ThinkBook 14+ 2024 14.5英寸轻薄本 ..."
}

{
    "instruction": "哪个商品的标题是\"ss109威震天变形MP威震玩具天金刚飞机威男孩机器人战机模型合金 震天战机（战损涂装版）\"？",
    "response": "<a_91><b_24><c_66><d_5>"
}

Base Models and Training

We fine‑tune Qwen1.5 (0.5B/1.8B/4B) and Yi‑6B. SID tokens are added to the vocabulary, and beam search with size 20 is used for constrained decoding. Experiments include offline evaluation (HR@1,5,10; NDCG@1,5,10) and online small‑traffic A/B tests (UCTR).

Results

Larger base models achieve higher offline metrics; 0.5B struggles with multi‑task training.

Yi‑6B outperforms Qwen variants, especially without constrained decoding.

Compared with collaborative‑only baselines, Yi‑6B shows superior performance on the same data scale.

Online tests show generative models achieve comparable or >5% UCTR improvement on low‑activity pages.

5. Future Optimization Directions

Key areas include building higher‑quality datasets, combining semantic, multimodal, and collaborative signals, developing scalable SID training/inference frameworks, applying LoRA, multi‑task mixing, model distillation, pruning, and quantization, and exploring query‑driven search‑recommendation integration and explanation generation.

Our goal is to continuously innovate recommendation technology to deliver more efficient and personalized user experiences, and we welcome collaborators interested in generative recommender systems.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

LLM cold start offline evaluation semantic ID item representation online A/B test generative recommender

Written by

JD Tech Talk

Official JD Tech public account delivering best practices and technology innovation.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.