Tagged articles
6 articles
Page 1 of 1
Architect
Architect
May 25, 2026 · Artificial Intelligence

From KV Cache to Harness: How DeepSeek Is Shifting Costs to the System Layer

DeepSeek’s recent V4 release shows that as model inference becomes cheaper, the dominant expenses are moving to system‑level components such as KV cache, memory, storage, compilers, scheduling, hardware adapters, and the emerging Agent Harness layer, reshaping AI infrastructure economics.

AI infrastructureAgent HarnessDeepSeek
0 likes · 23 min read
From KV Cache to Harness: How DeepSeek Is Shifting Costs to the System Layer
Network Intelligence Research Center (NIRC)
Network Intelligence Research Center (NIRC)
Jan 31, 2026 · Artificial Intelligence

How Engram Lets Large Models Swap GPU Memory for Cheap RAM to ‘Look Up’ Knowledge

The article dissects DeepSeek’s new Engram architecture, which separates computation from memory by using a large, cheap‑RAM‑based lookup table to store factual knowledge, allowing the transformer’s compute layers to focus on reasoning, dramatically reducing GPU memory demand while improving code, math, and long‑context performance.

EngramGPU MemoryLarge Language Model
0 likes · 7 min read
How Engram Lets Large Models Swap GPU Memory for Cheap RAM to ‘Look Up’ Knowledge
DataFunTalk
DataFunTalk
Jan 13, 2026 · Artificial Intelligence

How Conditional Memory (Engram) Boosts Large Language Models Beyond MoE

DeepSeek's new paper introduces a conditional memory mechanism called Engram that complements Mixture‑of‑Experts, providing O(1) lookup, improving knowledge retrieval, reasoning, and long‑context performance while scaling efficiently on the same FLOPs budget.

EngramSparse Modelsconditional memory
0 likes · 18 min read
How Conditional Memory (Engram) Boosts Large Language Models Beyond MoE
PaperAgent
PaperAgent
Jan 13, 2026 · Artificial Intelligence

How Engram’s Conditional Memory Redefines Sparsity in Large Language Models

DeepSeek’s newly released Engram module introduces a conditional memory mechanism that leverages O(1) N‑gram lookup to create a new sparsity axis for large language models, reducing early‑layer compute, improving inference efficiency, and delivering notable performance gains across reasoning and knowledge tasks, as demonstrated by extensive experiments on 27‑billion‑parameter models.

Efficient InferenceEngramLLM Sparsity
0 likes · 8 min read
How Engram’s Conditional Memory Redefines Sparsity in Large Language Models