Artificial Intelligence 7 min read

Technical Insights and Solution Strategies from the Tencent Advertising Algorithm Competition – Video Ad Track

The article outlines the Tencent Advertising Algorithm Competition’s video ad challenge, details the paper submission guidelines, and shares a participant’s step‑by‑step technical approach—including baseline experiments, model re‑implementation with Paddle, multimodal feature extraction, optimizer choices, and future improvement directions—providing practical AI insights for multimedia video classification.

Tencent Advertising Technology

Jun 22, 2021

Technical Insights and Solution Strategies from the Tencent Advertising Algorithm Competition – Video Ad Track

On June 16, 2021 the second round of the Tencent Advertising Algorithm Competition began, attracting 4,335 participants from thousands of universities and enterprises, and partnering with the ACM Multimedia conference as a grand challenge for video advertising.

The competition also opened a paper submission track; papers must be submitted in PDF using the ACM article template, limited to four pages plus references, and sent to [email protected] by July 11, 23:59.

A top contestant, nicknamed “Darwin”, shares his solution process for the video ad track. In the first two weeks he focused on reproducing the baseline (0.735 AUC) and found RMSProp gave modest gains (≈0.768 AUC) compared to Adam.

Mid‑competition, he switched from TensorFlow/Keras to Paddle to gain flexibility, and concentrated on model reproduction and modality selection, ensuring proper activation, regularization, and normalization; all modalities except the cover image contributed positively.

In the later stage, he emphasized improving video and text features using pretrained models (e.g., ViT for video, OCR/ASR for text), which raised performance to 0.795‑0.803 AUC. Model ensembling via multi‑fold cross‑validation added another 0.7‑0.8 % improvement.

He notes that attempts with ResNet3D for video features yielded negligible gains and slowed inference, so they were abandoned.

The final architecture concatenates modality‑wise NextVLAD‑processed global features, applies SE fusion, and uses a single fully‑connected layer for classification; training uses Adam with cosine annealing, and dropout rates are set above 0.9 for each modality.

Future work will explore loss functions, data augmentation, hierarchical labels, and end‑to‑end training to break the current performance plateau.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

deep learning feature extraction Multimodal Learning video classification Tencent competition

Written by

Tencent Advertising Technology

Official hub of Tencent Advertising Technology, sharing the team's latest cutting-edge achievements and advertising technology applications.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.