Sohu Tech Products
Jul 31, 2024 · Artificial Intelligence
MMEvalPro: A Trustworthy Benchmark for Evaluating Multimodal Large Models
MMEvalPro, a new benchmark created by researchers from Peking University, Chinese Academy of Medical Sciences, CUHK and Alibaba, augments existing multimodal datasets with perception and knowledge questions and introduces a Genuine Accuracy metric, revealing that top multimodal models still lag far behind humans and exposing shortcut‑driven performance on prior tests.
MMEvalProbenchmarklarge language models
0 likes · 11 min read