Tagged articles
3 articles
Page 1 of 1
Data Party THU
Data Party THU
Oct 9, 2025 · Artificial Intelligence

Can One Model Master All Audio‑Visual Tasks? Introducing Crab’s Unified Approach

This article presents Crab, a unified audio‑visual scene understanding model that leverages a novel display‑cooperation learning paradigm, introduces the AV‑UIE dataset with explicit reasoning steps, and demonstrates superior performance across temporal, spatial, pixel‑level, and spatio‑temporal tasks through extensive experiments and ablations.

LoRAMultimodalaudio-visual
0 likes · 12 min read
Can One Model Master All Audio‑Visual Tasks? Introducing Crab’s Unified Approach
AI Frontier Lectures
AI Frontier Lectures
Jun 20, 2025 · Artificial Intelligence

Can One Model Master All Audio‑Visual Tasks? Introducing Crab’s Unified Approach

Researchers from RUC, Tsinghua, and Tencent present Crab, a unified audio‑visual scene understanding model that leverages explicit cooperation and a new AV‑UIE dataset with visible reasoning steps, achieving state‑of‑the‑art performance across temporal, spatial, pixel‑level, and spatio‑temporal tasks.

LoRAaudio-visualscene understanding
0 likes · 13 min read
Can One Model Master All Audio‑Visual Tasks? Introducing Crab’s Unified Approach
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Nov 11, 2022 · Artificial Intelligence

Media Experience Quality Assessment: Visual Perception and Objective Quality Metrics

Professor Zhai’s REDtech talk explained how the human visual system underlies full‑, reduced‑ and no‑reference media quality metrics, introduced a free‑energy‑based perception model and pseudo‑reference technique for accurate no‑reference UGC video assessment, and discussed audio‑visual integration, opinion‑score distributions, and EEG‑based perceptual loss challenges.

UGCaudio-visualmedia quality assessment
0 likes · 15 min read
Media Experience Quality Assessment: Visual Perception and Objective Quality Metrics