AntTech
Mar 14, 2025 · Artificial Intelligence
MP-GUI: Modality Perception with Multimodal Large Language Models for GUI Understanding
The CVPR 2025 paper "MP-GUI: Modality Perception with MLLMs for GUI Understanding" presents a novel algorithm that enhances multimodal large language models' ability to perceive and reason about graphical user interfaces by integrating text, visual, and spatial signals through specialized perception modules and a dynamic fusion gate, achieving state‑of‑the‑art performance on multiple GUI benchmarks.
CVPR2025GUI UnderstandingMLLM
0 likes · 5 min read