Artificial Intelligence 21 min read

Common Video Advertising Algorithms: An Experience Sharing Session

Leveraging AI to analyze visual, audio, and textual cues, this guide details how video ads—pre‑, mid‑, post‑roll, overlays, and product placements—are recognized, generated, and optimized for usefulness, naturalness, and prominence through multimodal pipelines, algorithmic representations, and a scalable delivery architecture.

iQIYI Technical Product Team
iQIYI Technical Product Team
iQIYI Technical Product Team
Common Video Advertising Algorithms: An Experience Sharing Session

This document presents a comprehensive overview of video advertising placement techniques, focusing on how artificial intelligence can be leveraged to identify, generate, and optimize ad slots within online video content.

It begins by contrasting physical-world advertising (e.g., billboards, in‑flight ads) with digital ad formats such as in‑feed banners, splash ads, and search ads. The author emphasizes that video‑based ads are closely tied to AI because the video’s visual and audio content must be analyzed to recommend suitable insertion points.

Three primary ad formats are described:

Pre‑roll/mid‑roll/post‑roll (贴片广告) : full‑screen ads inserted at specific timestamps, often independent of video content.

Overlay ads (浮层) : semi‑transparent graphics placed at particular spatial locations (e.g., corners, center) and times, requiring relevance to the surrounding scene.

Product placement (植入广告) : objects or spoken mentions integrated into the video narrative, demanding realistic lighting, shadows, and contextual plausibility.

The author proposes three criteria for a good ad slot: usefulness (relevant to the viewer’s needs), naturalness (seamless integration with the video), and prominence (visible without obstructing key content).

From an AI perspective, two core capabilities are highlighted:

Recognition : extracting visual, audio, and textual cues to locate potential ad slots.

Generation : creating new slots, such as automatically generated previews (前情提要) that summarize previous episodes and can host ads.

The video understanding pipeline is broken down into three modalities—visual, audio, and text—and described as a hierarchy from coarse to fine‑grained analysis. Examples include object detection, action recognition, speech‑to‑text, and OCR for subtitles or on‑screen text.

Several algorithmic concepts are introduced:

Triples : (object, scene, action) representations.

Highlights (看点) : moments with high entertainment value.

Tone : sentiment or style of the scene.

Spatial attributes : presence of walls, tables, open space, etc.

Strategic considerations cover temporal continuity (using LSTM or 3‑D CNN), inter‑category relationships (e.g., clothing vs. vehicle detection), and retrieval‑based recall to supplement classification models.

A system architecture is outlined, showing how recognition results feed into a downstream ad‑delivery platform, with fast‑track pipelines for time‑critical ads and offline batch processing for less urgent placements.

Finally, the document discusses generation of ad slots, using AI to automatically edit short preview clips, select salient scenes, and ensure logical transitions, as exemplified by the platform’s current implementation of automated pre‑episode summaries.

ad placementAI algorithmsmultimodal analysiscontent understandingvideo advertising
iQIYI Technical Product Team
Written by

iQIYI Technical Product Team

The technical product team of iQIYI

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.