MD-VQA: Multi-Dimensional No-Reference Video Quality Assessment for CVPR NTIRE 2023
Alibaba’s Taobao VQA team won the CVPR NTIRE 2023 Video Enhancement Challenge by introducing MD‑VQA, a multi‑dimensional no‑reference video quality model that combines a Swin‑Transformer‑V2 spatial backbone, a pre‑trained SlowFast motion encoder, and a convolutional fusion module, pre‑trained on LSVQ, fine‑tuned on NTIRE data, and augmented spatio‑temporally, achieving state‑of‑the‑art SROCC and PLCC scores and now powering quality monitoring on Alibaba’s live‑streaming and short‑video services.
Alibaba's Taobao audio‑video team (TB‑VQA) won the CVPR NTIRE 2023 Quality Assessment of Video Enhancement Challenge, the only track of the competition.
The challenge focuses on no‑reference video quality assessment (VQA) for 1,211 real‑world videos that have undergone various enhancement operations.
To address the task, the team proposed MD‑VQA, a multi‑dimensional VQA model that extracts spatial semantics with a Swin‑Transformer‑V2 backbone, captures motion information with a pre‑trained SlowFast network, and fuses spatial and temporal features through a convolutional fusion module before regressing a quality score.
Data augmentation is performed in both spatial and temporal dimensions, and the model is first pre‑trained on the large LSVQ dataset (38,811 videos) and then fine‑tuned on the NTIRE training set.
Experimental results on KoNViD‑1k and LIVE‑VQC show that MD‑VQA achieves higher SROCC and PLCC than existing state‑of‑the‑art methods. Ablation studies confirm the contributions of the Swin backbone, feature‑fusion design, spatio‑temporal augmentation, and large‑scale pre‑training.
The model has been deployed in Alibaba’s live‑streaming and short‑video platforms (Taobao Live, Douyin‑like services) to monitor and improve video quality in real time, and is also used in other Alibaba products such as DingTalk and Alipay live streams.
DaTaobao Tech
Official account of DaTaobao Technology
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.