PortrAIt: AI-Powered Vertical Video Editing and Multimodal Matching for QQ Music
This article explains why video integration is essential for modern music products, introduces QQ Music's PortrAIt AI vertical video clipping technology, details its technical capabilities and business scenarios such as background videos and video playlists, and outlines current results and future development plans.
Motivation : With the rise of short‑form video platforms, music consumption has shifted from audio‑only experiences to video‑enhanced ones, making video integration a critical strategy for music apps to increase user engagement and discoverability.
Solution – PortrAIt : QQ Music developed an AI‑driven vertical video editing system called PortrAIt, which automatically selects focus areas, detects transitions, black borders, subtitles, and logos, locks the main subject (C‑position), smooths camera motion, and reconstructs the optimal visual region when converting horizontal videos to vertical format.
Technical Capabilities : The system includes precise transition type recognition via neural networks, segment‑level black‑border/subtitle/logo detection, C‑position locking for singers, smooth motion interpolation, and adaptive reconstruction of the maximum effective area to preserve resolution.
Business Scenarios : PortrAIt is applied to (1) background videos on the playback page, providing 30‑second vertical clips that boost foreground activity time, and (2) video playlists, where each song is paired with a short promotional video to increase exposure and conversion. It also supports video‑song matching through multimodal audio‑video pairing, leveraging music feature extraction, visual embeddings, and triplet‑margin loss training.
Results and Impact : After deployment, QQ Music observed significant increases in average foreground stay time, song play duration, and completion rates. The AI‑driven workflow reduces manual editing costs while maintaining quality through a lightweight human review step.
Future Outlook : Planned improvements include consolidating AI capabilities across scenarios, expanding the video material library with richer multimodal metadata, and building a fully automated pipeline for large‑scale video production, quality assessment, and distribution.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.