How DeepSeek’s New Paper Redefines Multimodal Reasoning with Visual Primitives
DeepSeek’s new paper "Thinking with Visual Primitives" tackles the reference gap in multimodal models by introducing points and boxes as reasoning units, achieving up to 8× token efficiency and leading benchmark scores in counting, spatial reasoning, and maze navigation compared with GPT‑5.4, Claude‑Sonnet‑4.6 and Gemini‑3‑Flash.
