How DeepSeek’s New Paper Redefines Multimodal Reasoning with Visual Primitives

DeepSeek’s new paper "Thinking with Visual Primitives" tackles the reference gap in multimodal models by introducing points and boxes as reasoning units, achieving up to 8× token efficiency and leading benchmark scores in counting, spatial reasoning, and maze navigation compared with GPT‑5.4, Claude‑Sonnet‑4.6 and Gemini‑3‑Flash.

DeepSeekMultimodalVisual Primitives

0 likes · 10 min read

How DeepSeek’s New Paper Redefines Multimodal Reasoning with Visual Primitives

Lao Guo's Learning Space

May 2, 2026 · Industry Insights

AI News Flash: DeepSeek Multimodal Breakthrough, Codex Major Update, Grok 4.3 Launch (May 1‑2)

The AI roundup covers OpenAI's Codex upgrade with Workspace Agents and 40% token efficiency, xAI's Grok 4.3 API offering 128K context and 60% lower pricing, Ant Group's open‑source Ling 2.6‑1T model, DeepSeek's multimodal Visual Primitives framework and its sudden removal, plus the ongoing GPT‑Plus account bans and their mitigation.

AI model benchmarksCodexDeepSeek

0 likes · 11 min read

AI News Flash: DeepSeek Multimodal Breakthrough, Codex Major Update, Grok 4.3 Launch (May 1‑2)

SuanNi

Apr 30, 2026 · Artificial Intelligence

DeepSeek’s New Multimodal Paradigm Compresses Images 7,056× and Outperforms GPT‑4/Claude in Visual Reasoning

DeepSeek’s multimodal model, built on the V4‑Flash architecture and a visual‑primitive reasoning approach, compresses a full‑resolution image by 7,056 times, achieves comparable or superior performance to GPT‑5.4 and Claude‑Sonnet‑4.6 on counting and spatial‑reasoning benchmarks, and does so with dramatically lower compute.

DeepSeekModel CompressionVisual Primitives

0 likes · 12 min read

DeepSeek’s New Multimodal Paradigm Compresses Images 7,056× and Outperforms GPT‑4/Claude in Visual Reasoning

PaperAgent

Apr 30, 2026 · Artificial Intelligence

DeepSeek Unveils Open‑Source Multimodal Model: “Thinking with Visual Primitives”

DeepSeek releases an open‑source multimodal LLM that introduces a visual‑primitive framework—elevating bounding boxes and points to token level—to close the reference gap, achieve extreme KV‑cache compression, and outperform GPT‑5.4, Claude‑Sonnet‑4.6 and Gemini‑3‑Flash on counting, spatial reasoning, maze navigation and path‑tracing benchmarks.

DeepSeekLLMMultimodal

0 likes · 13 min read

DeepSeek Unveils Open‑Source Multimodal Model: “Thinking with Visual Primitives”

Machine Heart

Apr 30, 2026 · Artificial Intelligence

How DeepSeek’s Visual‑Primitive Paradigm Redefines Multimodal Reasoning

DeepSeek has released a multimodal model built on a visual‑primitive reasoning paradigm that treats coordinates and bounding boxes as reasoning units, dramatically compresses visual tokens, and achieves state‑of‑the‑art performance on counting, spatial, and topological tasks, while exposing current limits of multimodal inference.

AI reasoningCompressed Sparse AttentionDeepSeek

0 likes · 12 min read

How DeepSeek’s Visual‑Primitive Paradigm Redefines Multimodal Reasoning

How DeepSeek’s New Paper Redefines Multimodal Reasoning with Visual Primitives

AI News Flash: DeepSeek Multimodal Breakthrough, Codex Major Update, Grok 4.3 Launch (May 1‑2)

DeepSeek’s New Multimodal Paradigm Compresses Images 7,056× and Outperforms GPT‑4/Claude in Visual Reasoning

DeepSeek Unveils Open‑Source Multimodal Model: “Thinking with Visual Primitives”

How DeepSeek’s Visual‑Primitive Paradigm Redefines Multimodal Reasoning

AI News Flash: DeepSeek Multimodal Breakthrough, Codex Major Update, Grok 4.3 Launch (May 1‑2)