Artificial Intelligence 13 min read

TransVCL: Attention‑Enhanced Video Copy Localization Network with Flexible Supervision

TransVCL introduces an end‑to‑end attention‑enhanced video copy localization network that leverages a custom Transformer, correlation‑Softmax similarity matrix, and temporal alignment module, combined with a semi‑supervised learning framework, achieving state‑of‑the‑art performance on VCSL and VCDB benchmarks.

AntTech
AntTech
AntTech
TransVCL: Attention‑Enhanced Video Copy Localization Network with Flexible Supervision

Each day, UGC platforms generate massive video content, leading to significant economic gains but also raising video copyright infringement issues due to complex editing operations such as picture‑in‑picture, cropping, rotation, reversal, and splicing.

To address this, Ant Group’s Digital Technology AIoT team proposes an end‑to‑end video copy localization network, TransVCL, which incorporates a semi‑supervised learning strategy and has been accepted by AAAI 2023.

The task requires not only video‑level copy detection but also segment‑level localization of copied fragments. TransVCL directly optimizes from raw frame‑level features using three core components: a custom Transformer‑based feature‑enhancement module, a correlation‑Softmax similarity‑matrix generator, and a temporal‑alignment module for precise segment detection.

Unlike previous methods that manually construct similarity matrices, TransVCL learns them jointly, allowing self‑attention and cross‑attention layers to fuse long‑range temporal information, resulting in cleaner similarity patterns and higher discrimination.

A semi‑supervised framework is introduced to exploit abundant unlabeled or weakly labeled video data. A teacher model trained on limited fully‑labeled data generates pseudo‑labels for the remaining data, which are filtered by confidence thresholds and combined with supervised loss during joint training.

Extensive experiments on public datasets VCSL and VCDB demonstrate that TransVCL achieves the current state‑of‑the‑art F‑score, outperforming prior methods by a large margin. Tables and figures illustrate the attention‑enhanced similarity maps, overall architecture, and quantitative gains from the semi‑supervised strategy.

The work represents a significant advancement in AI‑driven copyright protection and has been recognized as a leading contribution in the AAAI 2023 conference.

AItransformerAttentionSemi-supervised Learningcopyright protectionvideo copy detection
AntTech
Written by

AntTech

Technology is the core driver of Ant's future creation.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.