Artificial Intelligence 10 min read

Intelligent Creative System: Types, Quality Evaluation, Generation Models, and Optimization

The Intelligent Creative System defines advertising creatives across formats, evaluates image and text quality using reference‑based metrics and models like DeepBIQ, generates multimodal ads via GANs and Transformers, and selects optimal variants through bandit‑based CTR prediction and multimodal fusion, enabling scalable, data‑driven creative production.

HelloTech
HelloTech
HelloTech
Intelligent Creative System: Types, Quality Evaluation, Generation Models, and Optimization

What is a creative? A creative refers to the visual or textual content used in advertising, such as product ads, video ads, UGC graphics, and marketing activity banners. Different creative styles (e.g., banner vs. pop‑up) have distinct elements and attribute combinations, leading to a combinatorial explosion of possible designs.

Creative Types and Composition

Creative formats include product‑ad creatives, video creatives, UGC image‑text creatives, and marketing‑activity creatives. The number of possible combinations can be expressed as style × template × element count × attribute count , resulting in virtually unlimited variations.

How to Evaluate Creative Quality

From an algorithmic perspective, image quality assessment can be modeled under three reference conditions:

Full‑reference: both the pristine (reference) image and the distorted image are available.

Reduced‑reference: only partial information from the reference image is available.

No‑reference (blind): only the distorted image is given, which is the most challenging.

Common metrics include the Pearson correlation coefficient (linear correlation) and the Spearman rank‑order correlation coefficient (non‑parametric). The Pearson metric assumes normal distribution of data; otherwise, Spearman is used.

One notable model is DeepBIQ (2016). It splits an image into patches, predicts a quality score for each patch using pre‑trained CNNs (with transfer learning), and aggregates the scores. The model employs multiple feature‑fusion strategies (e.g., pooling+SVR, COMC+SVR, SVR+pooling) and selects the best performing one.

Textual quality is measured by language‑model perplexity (confusion). Lower perplexity indicates higher fluency because the sentence probability is higher.

Intelligent Creative System Architecture

The system consists of four major modules:

Content Understanding: entity recognition, classification, tag extraction, embedding, OCR.

Creative Generation: programmatic stitching, material generation, layout generation, element rendering.

Quality Evaluation: the image and text assessment methods described above.

Creative Selection: bandit models, CTR prediction, combinatorial search, multimodal feature fusion.

How Creative Generation Works

Generation models aim to learn a data distribution p_data and then sample from it. Two representative families are highlighted:

Generative Adversarial Networks (GANs): consist of a discriminator (trained with cross‑entropy loss) and a generator (trained to minimize the opposite loss). The adversarial training is a zero‑sum game where both networks improve iteratively.

Transformers: primarily used for text generation. The encoder encodes the input sequence (e.g., an article), and the decoder predicts the next token probabilities. Example: given the token probabilities (0.5, 0.5, 0.8), the third token is most likely.

Creative Selection (Optimization)

Selection addresses two challenges: (1) precise matching of multimodal creatives to users, requiring joint multimodal representations; (2) long‑tail diversity, tackled with a bandit model that maintains a Beta(win, lose) distribution for each creative, updating in real time based on click feedback.

System Demo

The demo shows the current implementation at 哈啰 (Hello). Although multimodal signals are not fully exploited yet, the framework follows a CTR+EE (exploration‑exploitation) paradigm. Ongoing work includes richer content understanding (multi‑label extraction), image‑text pairing, fine‑grained element‑level selection models, and more intelligent copy‑assistant tools.

AIGANtransformerMultimodalcreative generationBandit ModelQuality Evaluation
HelloTech
Written by

HelloTech

Official Hello technology account, sharing tech insights and developments.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.