Tagged articles
4 articles
Page 1 of 1
Design Hub
Design Hub
Jan 2, 2026 · Artificial Intelligence

DeepSeek’s “Mathematical Tight‑Fit” Tames AI: Constraints Drive Performance Gains

DeepSeek’s new mHC architecture replaces unconstrained hyper‑connections with manifold‑constrained doubly‑stochastic matrices, stabilizing large‑scale training, reducing signal explosion from 3000× to 1.6×, and delivering consistent accuracy improvements across BBH, DROP, GSM8K, and MMLU benchmarks while adding only 6.7% training overhead.

AI training stabilityDeepSeekhyper-connections
0 likes · 10 min read
DeepSeek’s “Mathematical Tight‑Fit” Tames AI: Constraints Drive Performance Gains
Architect
Architect
Jan 1, 2026 · Artificial Intelligence

How Manifold-Constrained Hyper-Connections Boost Large Model Training Efficiency

DeepSeek’s new paper introduces mHC, a manifold‑constrained version of Hyper‑Connections that stabilizes gradient flow, adds only 6.7% training overhead, and enables reliable training of 27‑billion‑parameter models while improving benchmark performance by about 2%.

AI architectureDeep LearningManifold-Constrained
0 likes · 7 min read
How Manifold-Constrained Hyper-Connections Boost Large Model Training Efficiency
AI Insight Log
AI Insight Log
Jan 1, 2026 · Artificial Intelligence

Can DeepSeek’s mHC Architecture Break ResNet’s Decade-Long Dominance?

DeepSeek’s new paper “mHC: Manifold‑Constrained Hyper‑Connections” proposes a novel architecture that replaces traditional residual connections with mathematically constrained hyper‑connections, showing on a 27B model a modest 6.7 % training‑time increase but significant stability gains and superior performance on BBH, DROP and GSM8K benchmarks.

DeepSeekLLM trainingResNet
0 likes · 8 min read
Can DeepSeek’s mHC Architecture Break ResNet’s Decade-Long Dominance?
PaperAgent
PaperAgent
Jan 1, 2026 · Artificial Intelligence

How Manifold-Constrained Hyper-Connections Boost Large-Scale Model Training Efficiency

The article introduces mHC, a Manifold‑Constrained Hyper‑Connections technique that replaces standard residual links with multiple learned pathways, using double‑stochastic matrices to lock gradients, achieving stable training of 27‑billion‑parameter models with only 6.7% extra compute and superior performance across eight downstream benchmarks.

AI architectureEfficient ImplementationManifold-Constrained
0 likes · 6 min read
How Manifold-Constrained Hyper-Connections Boost Large-Scale Model Training Efficiency