Inside the First Vision-Centric Parallel Thinking Framework for Vision-Language Models
The article introduces Visual Para-Thinker, the first parallel reasoning framework tailored for large‑scale vision‑language models, explains its block and scan visual path divisions, details the Path‑aware Attention and Learnable Parallel Rotary Position Embedding mechanisms, and presents experimental results showing significant gains on visual perception benchmarks.
