Tagged articles
6 articles
Page 1 of 1
Machine Heart
Machine Heart
May 11, 2026 · Artificial Intelligence

How PRISM Enables Efficient Test‑Time Scaling for Discrete Diffusion Language Models

The article analyzes how the PRISM framework redesigns test‑time scaling for discrete diffusion language models by replacing costly Best‑of‑N sampling with a three‑stage hierarchical search, local branching via partial remasking, and self‑verified feedback, achieving large accuracy gains on math and code benchmarks while cutting inference compute by up to four‑fold.

Discrete DiffusionHierarchical SearchInference Optimization
0 likes · 11 min read
How PRISM Enables Efficient Test‑Time Scaling for Discrete Diffusion Language Models
Tech Minimalism
Tech Minimalism
Mar 21, 2026 · Artificial Intelligence

Mastering Harness Engineering: The Key to AI Agent Programming

The article explains how Harness Engineering—comprising system prompts, tool integration, file systems, sandboxed execution, context management, and self‑verification loops—extends AI models into fully functional agents capable of memory, code execution, and long‑term autonomous tasks.

Context ManagementHarness EngineeringSelf-Verification
0 likes · 16 min read
Mastering Harness Engineering: The Key to AI Agent Programming
PMTalk Product Manager Community
PMTalk Product Manager Community
Dec 24, 2025 · Artificial Intelligence

Why AI Hallucinates and How Product Managers Can Tame It

The article explains the internal and external causes of AI hallucinations, examines how pre‑training data flaws and fine‑tuning choices amplify them, and presents a five‑pronged technical toolbox—including RAG, prompt engineering, chain‑of‑thought, self‑verification, and safety APIs—plus risk‑based product strategies for different industries.

AI HallucinationRAGRisk Assessment
0 likes · 12 min read
Why AI Hallucinates and How Product Managers Can Tame It
Old Meng AI Explorer
Old Meng AI Explorer
Dec 7, 2025 · Artificial Intelligence

Why DeepSeek-Math-V2 Is the New Benchmark for Rigorous AI Math Reasoning

DeepSeek-Math-V2, an open‑source math reasoning model from DeepSeek, introduces a self‑verification mechanism that ensures step‑by‑step logical correctness, achieving gold‑medal scores in IMO 2025, CMO 2024 and near‑perfect results in the Putnam 2024 competition, while offering free, extensible deployment for research, training, and scientific computation.

AI MathDeepSeekMathematical Reasoning
0 likes · 13 min read
Why DeepSeek-Math-V2 Is the New Benchmark for Rigorous AI Math Reasoning
Fun with Large Models
Fun with Large Models
Dec 5, 2025 · Artificial Intelligence

DeepSeek Math V2 & V3.2: A Plain‑Language Deep Dive into Core Innovations

This article provides a detailed, easy‑to‑understand analysis of DeepSeek‑Math‑V2’s self‑verification training method and DeepSeek‑V3.2’s GRPO framework, sparse‑attention DSA mechanism, massive agent data pipeline, and benchmark results that place both models among the world’s top open‑source large language models.

DeepSeekGRPOLLM
0 likes · 19 min read
DeepSeek Math V2 & V3.2: A Plain‑Language Deep Dive into Core Innovations
ShiZhen AI
ShiZhen AI
Nov 28, 2025 · Artificial Intelligence

DeepSeekMath‑V2 Scores 118/120 on Putnam and Achieves Gold‑Level IMO Performance

DeepSeekMath‑V2, released open‑source on 27 Nov 2025, attains gold‑level results on IMO 2025, scores 118 out of 120 on the Putnam 2024 competition, introduces a generator‑verifier self‑verification architecture, uses GRPO training, and outperforms leading closed‑source models on IMO‑ProofBench.

DeepSeekMath-V2GRPOLLM
0 likes · 7 min read
DeepSeekMath‑V2 Scores 118/120 on Putnam and Achieves Gold‑Level IMO Performance