Xiaohongshu Tech REDtech
Feb 17, 2025 · Artificial Intelligence
WorldSense: A New Benchmark for Evaluating Multimodal Large Models in Real‑World Scenarios
WorldSense, a new benchmark of 1,662 real‑world video‑audio clips and 3,172 QA pairs across 26 cognitive tasks, reveals that current multimodal large models achieve only 25%–48% accuracy, highlighting the crucial role of combined visual‑audio input and the difficulty of audio‑ and emotion‑related reasoning.
benchmark datasetlarge modelsmodel analysis
0 likes · 12 min read