Artificial Intelligence 10 min read

Reliable Advertising Image Generation and Creative Selection Using Multimodal Feedback and MLLM Representations

The 2024 advertising team introduced a suite of AI‑driven techniques—including a trustworthy feedback network, a large‑scale human‑annotated dataset, multimodal large language model representations, and online ranking architecture upgrades—to dramatically improve the quality, coverage, and personalization of generated ad creatives.

JD Tech Talk
JD Tech Talk
JD Tech Talk
Reliable Advertising Image Generation and Creative Selection Using Multimodal Feedback and MLLM Representations

1. Introduction High‑quality ad creatives boost information delivery, click‑through, and conversion rates. In 2023 the team leveraged AIGC to increase creative diversity, but low‑quality assets limited coverage. In 2024 they achieved breakthroughs in both generation and selection, enabling automatic high‑quality creative creation and personalized recommendation.

2. Reliable Creative Generation To address the low usable‑rate of AIGC images, the team built a Reliable Feedback Network (RFNet) that simulates human reviewers by integrating multiple auxiliary modalities. RFNet evaluates generated images, providing feedback that is back‑propagated to the diffusion model, substantially raising the proportion of acceptable images while preserving visual appeal. They also released the RF1M dataset, containing over one million human‑annotated generated ad images.

Human feedback (RLHF) is used to further enhance the diffusion model: the feedback scores are treated as human evaluations and back‑propagated, shortening generation time while improving image quality.

3. Offline Representation Construction and Integration Using multimodal large language models (MLLM), the team extracted both explicit (e.g., NER, background color, logo) and implicit (e.g., promotion status, target audience) features from creative images and texts. They built a contrastive learning pipeline (MOCO v3) to generate discriminative representations across SKUs and evaluated retrieval quality with the Fassi tool.

4. Online Ranking Architecture Optimization The online ranking model was upgraded to consider candidate‑creative features and list‑wise objectives, aligning offline CTR prediction with online list ordering. A joint training scheme splits the problem into a <user, sku> prediction and a subsequent creative ranking, reducing serving pressure while handling the combinatorial explosion of creative candidates.

5. Conclusion and Outlook The presented techniques substantially improve the usable‑rate of AIGC images, mitigate data sparsity and combinatorial explosion, and achieve precise online creative recommendation. Future work will focus on deeper multimodal fusion, stronger personalization, and further scaling of reliable feedback mechanisms.

References

1. Parallel Ranking of Ads and Creatives in Real‑Time Advertising Systems, AAAI 2024. 2. Towards Reliable Advertising Image Generation Using Human Feedback, ECCV 2024. 3. CBNet: A Plug‑and‑Play Network for Segmentation‑Based Scene Text Detection, IJCV 2024. 4. Generate E‑commerce Product Background by Integrating Category Commonality and Personalized Style, ICASSP 2025.

advertisingmultimodalimage generationAIGCcreative rankingMLLM
JD Tech Talk
Written by

JD Tech Talk

Official JD Tech public account delivering best practices and technology innovation.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.