Artificial Intelligence 7 min read

Short Video Tagging Using Neural Networks

The paper presents a gated‑attention neural network that fuses audio, visual, and title text features to automatically generate high‑quality tags for short videos, achieving state‑of‑the‑art performance on the YouTube‑8M challenge and enabling scalable tagging and recommendation services with future plans for broader tag coverage and temporal segment tagging.

iQIYI Technical Product Team
iQIYI Technical Product Team
iQIYI Technical Product Team
Short Video Tagging Using Neural Networks

This technical article discusses the development of neural network models for short video tagging, addressing the challenge of automatically generating relevant tags for short video content. The research focuses on leveraging audio-visual features and text data to improve tagging accuracy and efficiency.

The authors propose a gated attention neural network architecture that combines audio and video feature aggregation with text features. This model achieved state-of-the-art performance in the YouTube-8M video understanding challenge, outperforming existing single-model solutions by 0.3 percentage points in Global Average Precision (GAP). The system successfully implemented in practical applications covers thousands of high-quality content tags and dozens of category tags.

Key innovations include a gated attention mechanism for feature aggregation, which learns the importance of different feature components through a bottleneck structure. The model also incorporates text features from video titles processed by neural networks. Current implementations are deployed in internal business applications like short video tagging and recommendation systems, providing stable tagging services.

Future directions include expanding tag coverage, improving feature extraction models, and developing specialized models for underrepresented tag types. The research also explores extending capabilities from video-level tagging to precise time-segment tagging for longer videos.

Machine LearningAINeural Networksattention mechanismsvideo understandingshort video taggingYouTube-8M dataset
iQIYI Technical Product Team
Written by

iQIYI Technical Product Team

The technical product team of iQIYI

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.