Tagged articles

12 articles

Page 1 of 1

May 29, 2026 · Artificial Intelligence

What Makes DeepSeek V4 Different? A Deep Technical Dive into Its Innovations

DeepSeek V4 introduces a suite of architectural breakthroughs—including mixed‑expert MoE, manifold‑constrained hyper‑connections, CSA/HCA hybrid attention, and FP4 quantization—that slash inference cost by up to tenfold while delivering million‑token context, competitive benchmarks, dual model variants, and a disruptive pricing strategy.

AI Model BenchmarkDeepSeek V4Efficient Attention

0 likes · 41 min read

What Makes DeepSeek V4 Different? A Deep Technical Dive into Its Innovations

Machine Heart

Apr 29, 2026 · Artificial Intelligence

LCA Boosts Long-Context Inference: 2.5× Speedup and 90% KV Cache Reduction

The Latent‑Condensed Attention (LCA) method dramatically cuts KV‑cache memory by 90%, speeds up pre‑fill by 2.5× and reduces decode latency by 1.8× for 128K‑token contexts, while requiring no extra parameters and preserving model performance across diverse LLMs.

Efficient AttentionInference AccelerationKV cache reduction

0 likes · 10 min read

LCA Boosts Long-Context Inference: 2.5× Speedup and 90% KV Cache Reduction

Baobao Algorithm Notes

Apr 27, 2026 · Artificial Intelligence

DeepDive into DeepSeek‑V4: Efficient Million‑Token Context, Hybrid Attention, and Muon Optimizer

The article provides an in‑depth technical analysis of DeepSeek‑V4, detailing its novel hybrid attention architecture (CSA and HCA), the manifold‑constrained hyper‑connection (mHC), massive KV‑cache reductions, FLOPs savings across token lengths, and the Muon optimizer with Newton‑Schulz orthogonalization, all backed by concrete benchmark tables and code snippets.

DeepSeekEfficient AttentionKV cache reduction

0 likes · 61 min read

DeepDive into DeepSeek‑V4: Efficient Million‑Token Context, Hybrid Attention, and Muon Optimizer

AI Large-Model Wave and Transformation Guide

Mar 28, 2026 · Artificial Intelligence

From RNNs to Multimodal Agents: A Decade of Transformer Evolution

This article traces the evolution of sequence models from early RNN/LSTM designs through the breakthrough Transformer, its major branches, dense scaling, efficiency‑focused variants, next‑generation linear‑complexity SSMs, and finally multimodal agent architectures, highlighting each stage's strengths, weaknesses, and typical use cases.

AI ArchitectureEfficient AttentionLLM

0 likes · 12 min read

From RNNs to Multimodal Agents: A Decade of Transformer Evolution

Machine Learning Algorithms & Natural Language Processing

Mar 20, 2026 · Artificial Intelligence

Why Kimi Dropped Residual Connections: A First‑Person Deep Dive into Attention Residuals

This article explains how Attention Residuals (AttnRes) replace traditional residual shortcuts with layer‑wise attention, details the mathematical reformulation, design constraints, static‑Q trick, full and block variants, and presents experimental evidence of significant accuracy gains with modest overhead.

Efficient AttentionNLPRMSNorm

0 likes · 11 min read

Why Kimi Dropped Residual Connections: A First‑Person Deep Dive into Attention Residuals

AI Frontier Lectures

Mar 19, 2026 · Artificial Intelligence

Can Circulant Attention Reduce Vision Transformer Cost by 7×?

The article reviews the AAAI 2026 paper "Vision Transformers are Circulant Attention Learners", explaining how modeling self‑attention as a Block‑Circulant matrix enables FFT‑based multiplication that cuts the quadratic complexity of standard attention, achieving up to seven‑fold inference speed‑up while preserving accuracy across ImageNet, COCO and ADE20K benchmarks.

BCCB MatrixCirculant AttentionEfficient Attention

0 likes · 15 min read

Can Circulant Attention Reduce Vision Transformer Cost by 7×?

Huawei Cloud Developer Alliance

Oct 31, 2025 · Artificial Intelligence

Beyond Transformers: Exploring Post‑Transformer Architectures for Long‑Sequence Modeling

This article reviews the emerging post‑Transformer research landscape, covering linear state‑space models, efficient attention approximations, MLP/conv/RNN hybrids, sparse and causal attention mechanisms, and outlines future trends that may complement or replace the classic Transformer architecture for handling ultra‑long sequences.

AIEfficient AttentionHybrid Architecture

0 likes · 17 min read

Beyond Transformers: Exploring Post‑Transformer Architectures for Long‑Sequence Modeling

Data Party THU

Oct 16, 2025 · Artificial Intelligence

How Tensor Product Attention Redefines Long‑Context Transformers

The article analyzes the Tensor Product Attention (TPA) method presented at NeurIPS 2025, explaining how it factorizes Q, K, V tensors to drastically reduce KV cache size and attention complexity, and demonstrates superior convergence, lower perplexity, and faster inference on long‑sequence tasks compared with existing attention variants.

Efficient AttentionKV CacheRoPE

0 likes · 11 min read

How Tensor Product Attention Redefines Long‑Context Transformers

AIWalker

Jan 17, 2025 · Artificial Intelligence

How CLEAR Cuts Attention Compute by 99.5% and Enables Efficient On‑Device Text‑to‑Image Diffusion

The CLEAR method linearizes pretrained Diffusion Transformers by restricting attention to a local window, reducing attention FLOPs by 99.5%, accelerating 8K image generation 6.3× while preserving quality, and supporting multi‑GPU patch‑wise inference for high‑resolution text‑to‑image synthesis.

Diffusion TransformersEfficient AttentionHigh‑Resolution Image Generation

0 likes · 21 min read

How CLEAR Cuts Attention Compute by 99.5% and Enables Efficient On‑Device Text‑to‑Image Diffusion

NewBeeNLP

Aug 3, 2024 · Artificial Intelligence

Extending LLM Context to 1M Tokens: SAMBA, CoPE, RoPE, Retrieval Heads & Infini‑Attention

This article reviews recent research on extending large language model context windows to millions of tokens, covering SAMBA's hybrid architecture, Contextual Position Encoding (CoPE), RoPE base length theory, Retrieval Head analysis, and the memory‑efficient Infini‑Attention mechanism.

Efficient AttentionLLM researchlarge language models

0 likes · 10 min read

Extending LLM Context to 1M Tokens: SAMBA, CoPE, RoPE, Retrieval Heads & Infini‑Attention

DataFunSummit

Jul 18, 2022 · Artificial Intelligence

Advances in Natural Language Generation: ProphetNet, Knowledge‑Enhanced Generation, Non‑Autoregressive Pre‑training, Long‑Text Modeling, and Efficient Attention

This talk presents recent year’s research on natural language generation, covering the ProphetNet pre‑trained generation model, external‑knowledge integration for generation, non‑autoregressive pre‑training (BANG), the Poolingformer long‑text architecture, EL‑attention for faster decoding, and a new multi‑task generation benchmark.

Efficient Attentionknowledge integrationlong‑text modeling

0 likes · 22 min read

Advances in Natural Language Generation: ProphetNet, Knowledge‑Enhanced Generation, Non‑Autoregressive Pre‑training, Long‑Text Modeling, and Efficient Attention

Meituan Technology Team

Mar 24, 2022 · Artificial Intelligence

Twins: Efficient Visual Attention Models for Vision Transformers

The Twins series, a collaboration between Meituan and the University of Adelaide, introduces conditional positional encoding and spatially separable self‑attention to improve efficiency and performance of vision transformers, achieving state‑of‑the‑art results on ImageNet, ADE20K, COCO and high‑precision map segmentation.

ADE20KCOCOConditional Positional Encoding

0 likes · 20 min read

Twins: Efficient Visual Attention Models for Vision Transformers