MD-VQA: Multi-Dimensional No-Reference Video Quality Assessment for UGC Live Videos
MD‑VQA is a no‑reference video quality assessment model that combines semantic cues from EfficientNetV2, handcrafted distortion metrics, and motion information from ResNet3D‑18 to predict absolute quality of user‑generated live videos, trained on the large TaoLive dataset and achieving state‑of‑the‑art SRCC and PLCC results that are already deployed for real‑time monitoring on Taobao’s streaming platform.
MD-VQA is a no‑reference video quality assessment (VQA) model designed for user‑generated content (UGC) live videos, such as short videos and live streams on Taobao. The model integrates multi‑dimensional features—including semantic, distortion, and motion cues—to predict absolute video quality without requiring a pristine reference.
The authors constructed a large‑scale UGC video quality dataset called TaoLive, containing 3,762 videos across diverse content categories and resolutions (720p and 1080p). Each video was encoded with eight distortion levels, and 165,528 subjective quality scores were collected from 44 expert and consumer participants following ITU‑R BT.500‑13 guidelines.
MD-VQA extracts semantic features from the last four layers of a pre‑trained EfficientNetV2, hand‑crafted distortion features (blur, noise, blockiness, exposure, color), and motion features from a pre‑trained ResNet3D‑18. Frame‑level semantic and distortion features are fused temporally using absolute differences between adjacent frames. Spatial‑temporal fusion is performed via concatenation, multi‑layer perceptrons, and linear mappings, followed by three fully‑connected layers that regress the final quality score. Mean Squared Error (MSE) is used as the loss function.
Extensive experiments on public benchmarks (LIVE‑WC, YouTube‑UGC+) and the proprietary TaoLive dataset show that MD‑VQA outperforms state‑of‑the‑art methods in both Spearman Rank Order Correlation Coefficient (SRCC) and Pearson Linear Correlation Coefficient (PLCC). Ablation studies confirm the contributions of semantic, distortion, and motion features, as well as the absolute‑error and feature‑fusion modules.
MD‑VQA has been deployed in Taobao’s live‑streaming and short‑video services, enabling real‑time quality monitoring, automatic quality‑level filtering, and integration with Taobao’s custom S265 encoder and video enhancement pipelines, thereby improving overall user experience.
DaTaobao Tech
Official account of DaTaobao Technology
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.