Tag

GUI Understanding

1 views collected around this technical thread.

AntTech
AntTech
Mar 14, 2025 · Artificial Intelligence

MP-GUI: Modality Perception with Multimodal Large Language Models for GUI Understanding

The CVPR 2025 paper "MP-GUI: Modality Perception with MLLMs for GUI Understanding" presents a novel algorithm that enhances multimodal large language models' ability to perceive and reason about graphical user interfaces by integrating text, visual, and spatial signals through specialized perception modules and a dynamic fusion gate, achieving state‑of‑the‑art performance on multiple GUI benchmarks.

CVPR2025GUI UnderstandingMLLM
0 likes · 5 min read
MP-GUI: Modality Perception with Multimodal Large Language Models for GUI Understanding