Tagged articles
5 articles
Page 1 of 1
Old Zhang's AI Learning
Old Zhang's AI Learning
May 4, 2026 · Artificial Intelligence

How DeepSeek’s New Paper Redefines Multimodal Reasoning with Visual Primitives

DeepSeek’s new paper "Thinking with Visual Primitives" tackles the reference gap in multimodal models by introducing points and boxes as reasoning units, achieving up to 8× token efficiency and leading benchmark scores in counting, spatial reasoning, and maze navigation compared with GPT‑5.4, Claude‑Sonnet‑4.6 and Gemini‑3‑Flash.

DeepSeekMultimodalVisual Primitives
0 likes · 10 min read
How DeepSeek’s New Paper Redefines Multimodal Reasoning with Visual Primitives
Lao Guo's Learning Space
Lao Guo's Learning Space
May 2, 2026 · Industry Insights

AI News Flash: DeepSeek Multimodal Breakthrough, Codex Major Update, Grok 4.3 Launch (May 1‑2)

The AI roundup covers OpenAI's Codex upgrade with Workspace Agents and 40% token efficiency, xAI's Grok 4.3 API offering 128K context and 60% lower pricing, Ant Group's open‑source Ling 2.6‑1T model, DeepSeek's multimodal Visual Primitives framework and its sudden removal, plus the ongoing GPT‑Plus account bans and their mitigation.

AI model benchmarksCodexDeepSeek
0 likes · 11 min read
AI News Flash: DeepSeek Multimodal Breakthrough, Codex Major Update, Grok 4.3 Launch (May 1‑2)
SuanNi
SuanNi
Apr 30, 2026 · Artificial Intelligence

DeepSeek’s New Multimodal Paradigm Compresses Images 7,056× and Outperforms GPT‑4/Claude in Visual Reasoning

DeepSeek’s multimodal model, built on the V4‑Flash architecture and a visual‑primitive reasoning approach, compresses a full‑resolution image by 7,056 times, achieves comparable or superior performance to GPT‑5.4 and Claude‑Sonnet‑4.6 on counting and spatial‑reasoning benchmarks, and does so with dramatically lower compute.

DeepSeekModel CompressionVisual Primitives
0 likes · 12 min read
DeepSeek’s New Multimodal Paradigm Compresses Images 7,056× and Outperforms GPT‑4/Claude in Visual Reasoning
PaperAgent
PaperAgent
Apr 30, 2026 · Artificial Intelligence

DeepSeek Unveils Open‑Source Multimodal Model: “Thinking with Visual Primitives”

DeepSeek releases an open‑source multimodal LLM that introduces a visual‑primitive framework—elevating bounding boxes and points to token level—to close the reference gap, achieve extreme KV‑cache compression, and outperform GPT‑5.4, Claude‑Sonnet‑4.6 and Gemini‑3‑Flash on counting, spatial reasoning, maze navigation and path‑tracing benchmarks.

DeepSeekLLMMultimodal
0 likes · 13 min read
DeepSeek Unveils Open‑Source Multimodal Model: “Thinking with Visual Primitives”
Machine Heart
Machine Heart
Apr 30, 2026 · Artificial Intelligence

How DeepSeek’s Visual‑Primitive Paradigm Redefines Multimodal Reasoning

DeepSeek has released a multimodal model built on a visual‑primitive reasoning paradigm that treats coordinates and bounding boxes as reasoning units, dramatically compresses visual tokens, and achieves state‑of‑the‑art performance on counting, spatial, and topological tasks, while exposing current limits of multimodal inference.

AI reasoningCompressed Sparse AttentionDeepSeek
0 likes · 12 min read
How DeepSeek’s Visual‑Primitive Paradigm Redefines Multimodal Reasoning