Artificial Intelligence 14 min read

PortrAIt: AI-Powered Vertical Video Editing and Multimodal Matching for QQ Music

This article explains why video integration is essential for modern music products, introduces QQ Music's PortrAIt AI vertical video clipping technology, details its technical capabilities and business scenarios such as background videos and video playlists, and outlines current results and future development plans.

DataFunSummit

Jan 27, 2022

PortrAIt: AI-Powered Vertical Video Editing and Multimodal Matching for QQ Music

Motivation : With the rise of short‑form video platforms, music consumption has shifted from audio‑only experiences to video‑enhanced ones, making video integration a critical strategy for music apps to increase user engagement and discoverability.

Solution – PortrAIt : QQ Music developed an AI‑driven vertical video editing system called PortrAIt, which automatically selects focus areas, detects transitions, black borders, subtitles, and logos, locks the main subject (C‑position), smooths camera motion, and reconstructs the optimal visual region when converting horizontal videos to vertical format.

Technical Capabilities : The system includes precise transition type recognition via neural networks, segment‑level black‑border/subtitle/logo detection, C‑position locking for singers, smooth motion interpolation, and adaptive reconstruction of the maximum effective area to preserve resolution.

Business Scenarios : PortrAIt is applied to (1) background videos on the playback page, providing 30‑second vertical clips that boost foreground activity time, and (2) video playlists, where each song is paired with a short promotional video to increase exposure and conversion. It also supports video‑song matching through multimodal audio‑video pairing, leveraging music feature extraction, visual embeddings, and triplet‑margin loss training.

Results and Impact : After deployment, QQ Music observed significant increases in average foreground stay time, song play duration, and completion rates. The AI‑driven workflow reduces manual editing costs while maintaining quality through a lightweight human review step.

Future Outlook : Planned improvements include consolidating AI capabilities across scenarios, expanding the video material library with richer multimodal metadata, and building a fully automated pipeline for large‑scale video production, quality assessment, and distribution.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

qq music AI video editing multimodal matching Music Streaming PortrAIt vertical video

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.