Artificial Intelligence 7 min read

iQIYI-VID: A Large-Scale Multimodal Video Dataset for Person Recognition

iQIYI-VID is the world’s largest multimodal video dataset for person recognition, containing 10,000 celebrity identities and 600,000 video clips drawn from millions of videos, supporting tasks such as detection, identification, attribute and audio analysis, and serving as the basis for 2018‑2019 challenges and a face‑recognition subset, thereby driving research while performance gaps remain.

iQIYI Technical Product Team
iQIYI Technical Product Team
iQIYI Technical Product Team
iQIYI-VID: A Large-Scale Multimodal Video Dataset for Person Recognition

Person recognition is one of the most important tasks in multimedia. In real‑world scenarios, variations in pose, expression, clothing, makeup, etc., make it highly challenging. With the development of deep learning, significant progress has been made in face recognition, person re‑identification, and speaker recognition, but a single modality is insufficient for massive video data, leaving considerable research challenges.

To promote multimodal person‑recognition research, iQIYI has built the world’s largest multimodal video dataset, iQIYI-VID . The dataset contains 10,000 celebrity identities and 600,000 video clips sourced from 400,000 long videos and 1,000,000 short videos, aiming to drive technical innovation.

iQIYI organized multimodal person‑recognition challenges in 2018 and 2019 (in conjunction with PRCV, ACMMM, and ICCV), releasing iQIYI‑VID‑2018 and iQIYI‑VID‑2019, which have become new standards in the field. An additional face‑recognition dataset, iQIYI‑VID‑FACE, was provided for the Lightweight Face Recognition Challenge. All datasets have been consolidated into iQIYI‑VID and are available for download at http://challenge.ai.iqiyi.com/data-cluster .

Compared with other person‑recognition datasets, iQIYI‑VID originates from massive video data and effectively addresses challenges such as diverse poses, expressions, ages, lighting conditions, resolutions, makeup, and occlusions. It supports research on detection, identification, attribute analysis, action analysis, subtitles, and audio.

The dataset is divided into four sub‑tasks (A, B, C, D):

Task B corresponds to the 2018 multimodal person‑recognition challenge (iQIYI‑VID‑2018). It contains 4,934 identities, with 219,677 training clips and 172,860 validation clips. Details: http://challenge.ai.iqiyi.com/detail?raceId=5b1129e42a360316a898ff4f

Task C corresponds to the 2019 challenge (iQIYI‑VID‑2019). It includes 10,034 identities, with 60,566 training clips and 76,013 validation clips. Details: http://challenge.ai.iqiyi.com/detail?raceId=5c767dc41a6fa0ccf53922e6

Task A aggregates the data of Tasks B and C, covering 10,034 identities, 240,129 training clips, and 197,329 validation clips.

Task D is the iQIYI‑VID‑FACE image dataset for the Lightweight Face Recognition Challenge, containing 9,998 identities and 6,311,490 images. Details: https://ibug.doc.ic.ac.uk/resources/lightweight-face-recognition-challenge-workshop

Current best results: the top MAP on the Task C test set reaches 91.14%; on Task D, a large model achieves 0.72981 and a small model 0.72226 (TPR@FPR=1e‑4), leaving room for further improvement.

If you use iQIYI‑VID in a paper, please cite the following work: Yuanliu Liu, Peipei Shi, Bo Peng, et al., “iQIYI Celebrity Video Identification Challenge”, ACM MM’19 Grand Challenges.

Tip: Before downloading iQIYI‑VID‑FACE, remember to register on the iQIYI competition website http://challenge.ai.iqiyi.com/ .

Computer VisionAImultimodaliQIYI-VIDperson recognitionvideo dataset
iQIYI Technical Product Team
Written by

iQIYI Technical Product Team

The technical product team of iQIYI

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.