Tagged articles
13 articles
Page 1 of 1
Data STUDIO
Data STUDIO
May 6, 2026 · Artificial Intelligence

DeepSeek V4 (Flash & Pro) Unveils Million‑Token Context and Trillion‑Parameter Inference

The April 24, 2026 release of DeepSeek V4 introduces Hybrid Attention (CSA/HCA), Manifold‑Constrained Hyper‑Connections, and the Muon optimizer, delivering 1 M‑token context windows, up to 1.6 T parameters, competitive benchmark scores against Claude and GPT, dramatically lower inference costs, and detailed deployment guidelines that expose both performance gains and practical challenges.

AI benchmarkingDeepSeek V4Hybrid Attention
0 likes · 17 min read
DeepSeek V4 (Flash & Pro) Unveils Million‑Token Context and Trillion‑Parameter Inference
CodeTrend
CodeTrend
Apr 26, 2026 · Artificial Intelligence

DeepSeek V4 Architecture: High‑Efficiency Long‑Context Model Design

DeepSeek V4, released in April 2026, introduces two versions—Pro and Flash—with up to 1.6 trillion parameters and a million‑token context window, leveraging hybrid attention, compressed KV cache, and specialized training techniques to dramatically cut hardware dependence and inference cost.

DeepSeekFP4Hybrid Attention
0 likes · 5 min read
DeepSeek V4 Architecture: High‑Efficiency Long‑Context Model Design
Shuge Unlimited
Shuge Unlimited
Apr 25, 2026 · Artificial Intelligence

DeepSeek V4: Comeback? 1.6 T Params, Million‑Token Context, Open‑Source Matches Closed‑Source

DeepSeek V4, released shortly after GPT‑5.5, offers two models—V4‑Pro (1.6 T parameters) and V4‑Flash (284 B parameters)—that introduce a hybrid CSA/HCA attention architecture to enable efficient million‑token context, achieve dramatic FLOPs and KV savings, deliver competitive programming and agent benchmarks, and adopt a disruptive pricing strategy, while also exposing training‑stability tricks and highlighting both strengths and remaining gaps.

DeepSeek V4Hybrid AttentionLLM
0 likes · 25 min read
DeepSeek V4: Comeback? 1.6 T Params, Million‑Token Context, Open‑Source Matches Closed‑Source
SuanNi
SuanNi
Apr 24, 2026 · Artificial Intelligence

DeepSeek-V4 Launches: Million-Token Context Becomes Affordable for All

DeepSeek-V4 introduces a hybrid attention architecture, manifold‑constrained hyper‑connections, and the Muon optimizer to cut inference FLOPs and KV cache dramatically, enabling open‑source models to handle million‑token contexts at a fraction of the cost of leading closed‑source services while matching their performance.

DeepSeek V4Hybrid AttentionLarge Language Model
0 likes · 7 min read
DeepSeek-V4 Launches: Million-Token Context Becomes Affordable for All
AI Large Model Application Practice
AI Large Model Application Practice
Apr 24, 2026 · Artificial Intelligence

DeepSeek V4 Preview: Key Technical Highlights, Benchmarks, and Pricing

The DeepSeek‑V4 preview details two model variants—Pro and Flash—with trillion‑scale parameters, outlines benchmark scores that surpass or match leading overseas models across code generation, real‑world fixes, engineering tasks, and world knowledge, and explains core innovations, pricing, API endpoints, and open‑source licensing.

APIDeepSeekHybrid Attention
0 likes · 7 min read
DeepSeek V4 Preview: Key Technical Highlights, Benchmarks, and Pricing
AI Explorer
AI Explorer
Apr 24, 2026 · Artificial Intelligence

DeepSeek-V4 Raises the Bar: 1.6T‑Parameter Open‑Source Model Challenges Closed‑Source Giants

DeepSeek-V4 introduces two open‑source LLMs—V4‑Pro with 1.6 trillion total parameters and V4‑Flash with 284 billion—offering a 1 million‑token context window, hybrid attention, multi‑head compression, and a new Muon optimizer, all under an MIT license that rivals top closed‑source models.

DeepSeek V4Hybrid AttentionLarge Language Model
0 likes · 6 min read
DeepSeek-V4 Raises the Bar: 1.6T‑Parameter Open‑Source Model Challenges Closed‑Source Giants
AI Engineering
AI Engineering
Apr 22, 2026 · Artificial Intelligence

Qwen3.6-27B Runs Locally on 18 GB RAM and Outperforms a 397 B‑Parameter Model

Alibaba’s open‑source Qwen3.6‑27B model can be run on consumer hardware with as little as 18 GB of RAM using 4‑bit quantization, and its hybrid attention architecture delivers higher accuracy on coding benchmarks such as Terminal‑Bench 2.0 and SWE‑bench Pro than the much larger 397‑B‑parameter Qwen3.5‑397B‑A17B MoE model.

4-bit quantizationHybrid AttentionLLM
0 likes · 5 min read
Qwen3.6-27B Runs Locally on 18 GB RAM and Outperforms a 397 B‑Parameter Model
Old Zhang's AI Learning
Old Zhang's AI Learning
Apr 21, 2026 · Artificial Intelligence

Prefill-as-a-Service Boosts LLM Inference Throughput by 54%

A joint Moonshot AI and Tsinghua study shows that the Prefill-as-a-Service (PrfaaS) architecture, enabled by hybrid‑attention models that shrink KVCache size, can offload long Prefill work to a remote cluster and, with dual‑timescale scheduling, achieve a 54% throughput gain over homogeneous PD deployment and 32% over naive heterogeneous setups.

Hybrid AttentionKVCache optimizationLLM inference
0 likes · 12 min read
Prefill-as-a-Service Boosts LLM Inference Throughput by 54%
SuanNi
SuanNi
Apr 3, 2026 · Artificial Intelligence

How Gemma 4 Packs Cloud‑Grade AI Into Your Pocket Devices

Google’s newly released Gemma 4 series delivers a range of open‑source LLMs—from 2.3 B to 31 B parameters—optimized for edge devices through per‑layer embeddings, mixed‑expert MoE, hybrid attention, and extensive hardware support, achieving top‑tier benchmark scores while running efficiently on phones and IoT.

Edge AIGemma 4Hybrid Attention
0 likes · 10 min read
How Gemma 4 Packs Cloud‑Grade AI Into Your Pocket Devices
Baobao Algorithm Notes
Baobao Algorithm Notes
Sep 10, 2025 · Artificial Intelligence

Qwen3-Next Unveiled: Sparse MoE, Hybrid Attention & Multi‑Token Prediction

A recent Hugging Face pull request reveals Alibaba’s upcoming Qwen3‑Next series, highlighting its extreme‑context, parameter‑efficient design that combines a 1:50 high‑sparsity MoE, a hybrid attention architecture mixing gated attention with Gated DeltaNet, and a Multi‑Token Prediction technique, promising ten‑fold throughput gains for 32K‑plus token contexts.

AI ArchitectureHybrid AttentionMulti‑token prediction
0 likes · 8 min read
Qwen3-Next Unveiled: Sparse MoE, Hybrid Attention & Multi‑Token Prediction
DataFunTalk
DataFunTalk
Jul 16, 2025 · Artificial Intelligence

MiniMax-M1 Revealed: Hybrid Attention, RL Training, and 1M Token Context

MiniMax’s latest M1 model, unveiled after a $300 million funding round, showcases a 4.56‑trillion‑parameter hybrid‑expert architecture with lightning attention, supporting up to one million tokens, and leverages reinforcement‑learning techniques to enhance long‑context handling, inference efficiency, and system‑2 reasoning capabilities.

AI scalingHybrid AttentionModel architecture
0 likes · 16 min read
MiniMax-M1 Revealed: Hybrid Attention, RL Training, and 1M Token Context