Video Multi-Label Classification Using Graph Convolutional Networks
This paper introduces a method for video multi-label classification that incorporates label correlation features using graph convolutional networks, significantly improving classification performance.
This paper presents work from the AI team at Autohome's intelligent data center on a video understanding paper from ICCV 2019. The paper addresses video multi-label classification by incorporating label correlation features into the network, demonstrating significant improvements in classification performance.
The paper begins with background on video content understanding, noting that video content is more complex than images, making single-label classification insufficient. It discusses the YouTube-8M dataset, which contains 6.1 million video-level annotations across 3,862 classes, with an average of 3 labels per video.
The paper explores label correlation, showing how certain labels frequently co-occur in videos. For example, when BMW and Engine appear, Car is likely to appear as well. The authors construct a weighted graph to represent label relationships, where edge weights indicate correlation strength between labels.
The paper then introduces Graph Convolutional Networks (GCN) as a solution for capturing label correlations. The GCN takes two inputs: a feature matrix H and a correlation coefficient matrix A. The correlation matrix is initialized using conditional probabilities calculated from label co-occurrence in the training data, with a threshold applied to remove weak correlations.
The complete network architecture combines InceptionV3 for feature extraction, NeXtVLAD for feature aggregation, and a two-layer GCN for final classification. Experimental results show that adding GCN improves MAP (Mean-Average-Precision) by nearly one percentage point, demonstrating the effectiveness of modeling label correlations for multi-label classification tasks.
The paper concludes that incorporating label correlation features through GCNs can significantly enhance video understanding capabilities in neural networks, providing a promising direction for multi-label classification research.
HomeTech
HomeTech tech sharing
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.