Tagged articles
2 articles
Page 1 of 1
Architect
Architect
May 25, 2026 · Artificial Intelligence

From KV Cache to Harness: How DeepSeek Is Shifting Costs to the System Layer

DeepSeek’s recent V4 release shows that as model inference becomes cheaper, the dominant expenses are moving to system‑level components such as KV cache, memory, storage, compilers, scheduling, hardware adapters, and the emerging Agent Harness layer, reshaping AI infrastructure economics.

AI infrastructureAgent HarnessDeepSeek
0 likes · 23 min read
From KV Cache to Harness: How DeepSeek Is Shifting Costs to the System Layer
Old Zhang's AI Learning
Old Zhang's AI Learning
Apr 23, 2026 · Artificial Intelligence

DeepSeek Quietly Open‑Sources TileKernels to Push GPU Performance to Its Limits

DeepSeek has released TileKernels, a GPU kernel library written in the TileLang DSL, that targets H100/H200/B200 GPUs and claims to approach hardware limits in compute intensity and memory bandwidth, offering MoE routing, FP8/FP4 quantization, and dual‑language PyTorch references for deep‑learning engineers.

FP8 quantizationGPU optimizationLLM training
0 likes · 9 min read
DeepSeek Quietly Open‑Sources TileKernels to Push GPU Performance to Its Limits