Tag

video captioning

0 views collected around this technical thread.

DataFunSummit
DataFunSummit
Sep 17, 2024 · Artificial Intelligence

Multimodal Video Understanding for Real-World Surveillance: Tasks, Dataset, Models, and Challenges

This article presents a comprehensive overview of multimodal video understanding for real-world surveillance, covering task definitions, the new UCA multimodal surveillance dataset, baseline models for video moment localization, captioning, and anomaly detection, experimental results, challenges, and future research directions.

AI modelslarge language modelsmultimodal video understanding
0 likes · 19 min read
Multimodal Video Understanding for Real-World Surveillance: Tasks, Dataset, Models, and Challenges