Tagged articles
2 articles
Page 1 of 1
AI2ML AI to Machine Learning
AI2ML AI to Machine Learning
Apr 25, 2026 · Artificial Intelligence

How DeepSeek V4 Advances Structured Optimization in the Large‑Model Era

The article analyses DeepSeek V4’s architectural innovations—including Compressed Sparse Attention, Heavily Compressed Attention, a cross‑layer MoE design, and an Agent‑RL framework with Generative Reward Models and multi‑teacher distillation—while comparing its long‑context capabilities and efficiency to rival LLMs such as GLM, Kimi, Claude, GPT and Gemini.

Agent Reinforcement LearningCompressed Sparse AttentionDeepSeek V4
0 likes · 7 min read
How DeepSeek V4 Advances Structured Optimization in the Large‑Model Era
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Apr 25, 2026 · Artificial Intelligence

Why DeepSeek‑V4 Took Twice as Long: Inside the Training‑Stability Challenges and Engineering Hacks

The DeepSeek‑V4 technical report reveals that the model’s doubled training time stems from massive token and parameter scaling, severe training‑stability issues in MoE layers, and a suite of engineering solutions—including Anticipatory Routing, SwiGLU Clamping, specialist expert training, and a custom sandbox cluster—while also exposing high hallucination rates despite impressive benchmark performance.

BenchmarkDeepSeek V4Generative Reward Model
0 likes · 12 min read
Why DeepSeek‑V4 Took Twice as Long: Inside the Training‑Stability Challenges and Engineering Hacks