The First Visual‑Language Parallel Thinking Framework: Unpacking Its Core Mechanisms
The paper introduces Visual Para-Thinker, a parallel‑thinking framework for large‑scale visual‑language models that uses visual‑centered block and scan path partitions, Path‑aware Attention and Learnable Parallel Rotary Position Embedding, and demonstrates consistent gains across counting, visual search, hallucination and grounding benchmarks.
