Artificial Intelligence 10 min read

CTR-Driven Advertising Image Generation Using Multimodal Large Language Models (CAIG)

The JD advertising team proposes a CTR‑driven advertising image generation framework (CAIG) that leverages multimodal large language models, a novel reward model, and product‑centric preference optimization to produce ad images with superior click‑through performance, validated by extensive offline and online experiments.

JD Tech
JD Tech
JD Tech
CTR-Driven Advertising Image Generation Using Multimodal Large Language Models (CAIG)

The JD advertising team explores using multimodal large language models (MLLMs) to generate advertising images optimized for click‑through rate (CTR), introducing a novel reward model and product‑centric preference optimization strategy that achieve state‑of‑the‑art performance on both offline and online metrics.

Existing ad image generation methods focus on aesthetic quality without considering online performance; this work addresses that gap by pre‑training MLLMs on a large e‑commerce multimodal dataset and fine‑tuning them with reinforcement learning guided by a reward model that reflects user click preferences.

The overall CAIG pipeline first injects e‑commerce domain knowledge into the MLLM, then trains a reward model (RM) on paired ad images with relative CTR comparisons, and finally applies a CTR‑driven preference optimization phase using product‑centric preference optimization (PCPO) to align generated backgrounds with product features.

For e‑commerce knowledge pre‑training, three tasks are used: image understanding, multimodal content understanding, and prompt generation, leveraging 1.2 million samples from JD’s platform.

The RM treats CTR prediction as a relative comparison between image pairs, concatenating visual and textual representations, and employs a classification head together with a point‑level loss for fine‑grained CTR regression.

CTR‑driven optimization formalizes image generation as a preference selection problem, employing Direct Preference Optimization (DPO) and the proposed PCPO, which keeps product information as the sole variable to ensure background relevance while improving CTR.

Extensive experiments show the proposed method outperforms both closed‑source (e.g., GLM4V, Claude 3.5 Sonnet, GPT‑4o) and open‑source (e.g., VAM, CG4CTR) baselines on reward model accuracy, product‑background relevance, and online CTR gains across 44 product categories.

An online A/B test over one week confirms that CAIG consistently raises CTR compared to baseline MLLM generation, demonstrating the effectiveness of the reward model and PCPO in real‑world e‑commerce advertising.

For further discussion, contact [email protected]; JD’s advertising creative team also invites talent in AIGC and large‑model research to join their efforts.

Reward Modelreinforcement learningmultimodal LLMadvertising image generationCTR optimizationproduct-centric preference
JD Tech
Written by

JD Tech

Official JD technology sharing platform. All the cutting‑edge JD tech, innovative insights, and open‑source solutions you’re looking for, all in one place.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.