Artificial Intelligence 7 min read

Video Multi-Label Classification Using Graph Convolutional Networks

This paper introduces a method for video multi-label classification that incorporates label correlation features using graph convolutional networks, significantly improving classification performance.

HomeTech

Mar 4, 2020

Video Multi-Label Classification Using Graph Convolutional Networks

This paper presents work from the AI team at Autohome's intelligent data center on a video understanding paper from ICCV 2019. The paper addresses video multi-label classification by incorporating label correlation features into the network, demonstrating significant improvements in classification performance.

The paper begins with background on video content understanding, noting that video content is more complex than images, making single-label classification insufficient. It discusses the YouTube-8M dataset, which contains 6.1 million video-level annotations across 3,862 classes, with an average of 3 labels per video.

The paper explores label correlation, showing how certain labels frequently co-occur in videos. For example, when BMW and Engine appear, Car is likely to appear as well. The authors construct a weighted graph to represent label relationships, where edge weights indicate correlation strength between labels.

The paper then introduces Graph Convolutional Networks (GCN) as a solution for capturing label correlations. The GCN takes two inputs: a feature matrix H and a correlation coefficient matrix A. The correlation matrix is initialized using conditional probabilities calculated from label co-occurrence in the training data, with a threshold applied to remove weak correlations.

The complete network architecture combines InceptionV3 for feature extraction, NeXtVLAD for feature aggregation, and a two-layer GCN for final classification. Experimental results show that adding GCN improves MAP (Mean-Average-Precision) by nearly one percentage point, demonstrating the effectiveness of modeling label correlations for multi-label classification tasks.

The paper concludes that incorporating label correlation features through GCNs can significantly enhance video understanding capabilities in neural networks, providing a promising direction for multi-label classification research.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

computer vision deep learning video understanding GCN graph convolutional networks multi-label classification NeXtVLAD YouTube-8M feature aggregation InceptionV3 label correlation

Written by

HomeTech

HomeTech tech sharing

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.