Artificial Intelligence 10 min read

CTR-Driven Advertising Image Generation Using Multimodal Large Language Models (CAIG)

The JD advertising team proposes a CTR‑driven advertising image generation framework (CAIG) that leverages multimodal large language models, a novel reward model, and product‑centric preference optimization to produce ad images with superior click‑through performance, validated by extensive offline and online experiments.

JD Tech

Mar 26, 2025

CTR-Driven Advertising Image Generation Using Multimodal Large Language Models (CAIG)

The JD advertising team explores using multimodal large language models (MLLMs) to generate advertising images optimized for click‑through rate (CTR), introducing a novel reward model and product‑centric preference optimization strategy that achieve state‑of‑the‑art performance on both offline and online metrics.

Existing ad image generation methods focus on aesthetic quality without considering online performance; this work addresses that gap by pre‑training MLLMs on a large e‑commerce multimodal dataset and fine‑tuning them with reinforcement learning guided by a reward model that reflects user click preferences.

The overall CAIG pipeline first injects e‑commerce domain knowledge into the MLLM, then trains a reward model (RM) on paired ad images with relative CTR comparisons, and finally applies a CTR‑driven preference optimization phase using product‑centric preference optimization (PCPO) to align generated backgrounds with product features.

For e‑commerce knowledge pre‑training, three tasks are used: image understanding, multimodal content understanding, and prompt generation, leveraging 1.2 million samples from JD’s platform.

The RM treats CTR prediction as a relative comparison between image pairs, concatenating visual and textual representations, and employs a classification head together with a point‑level loss for fine‑grained CTR regression.

CTR‑driven optimization formalizes image generation as a preference selection problem, employing Direct Preference Optimization (DPO) and the proposed PCPO, which keeps product information as the sole variable to ensure background relevance while improving CTR.

Extensive experiments show the proposed method outperforms both closed‑source (e.g., GLM4V, Claude 3.5 Sonnet, GPT‑4o) and open‑source (e.g., VAM, CG4CTR) baselines on reward model accuracy, product‑background relevance, and online CTR gains across 44 product categories.

An online A/B test over one week confirms that CAIG consistently raises CTR compared to baseline MLLM generation, demonstrating the effectiveness of the reward model and PCPO in real‑world e‑commerce advertising.

For further discussion, contact [email protected]; JD’s advertising creative team also invites talent in AIGC and large‑model research to join their efforts.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Reward Model reinforcement learning multimodal LLM advertising image generation CTR optimization product-centric preference

Written by

JD Tech

Official JD technology sharing platform. All the cutting‑edge JD tech, innovative insights, and open‑source solutions you’re looking for, all in one place.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.