Artificial Intelligence 13 min read

DeepSeek V Series: Technical Overview of Scaling Laws, Grouped Query Attention, and Mixture‑of‑Experts

The article reviews DeepSeek’s V‑series papers, explaining how scaling‑law insights, Grouped Query Attention, a depth‑first design, loss‑free load balancing, multi‑token prediction and Multi‑Head Latent Attention together enable economical mixture‑of‑experts LLMs that rival closed‑source models while cutting compute and hardware costs.

Tencent Cloud Developer

Feb 6, 2025

DeepSeek V Series: Technical Overview of Scaling Laws, Grouped Query Attention, and Mixture‑of‑Experts

DeepSeek’s latest V‑series models have attracted great attention. This article provides a zero‑technical‑barrier overview of the four papers that introduce the series.

What you will gain

Quickly grasp the technical logic that runs through the four papers.

Understand how DeepSeek challenges the dominant “spend‑more‑to‑win” narrative of closed‑source LLMs.

See a critical view of the current AI research and industry practices.

The four papers are freely available for download:

2401 – DeepSeek LLM: Scaling Open‑Source Language Models with Longtermism

2405 – DeepSeek‑V2: A Strong, Economical, and Efficient Mixture‑of‑Experts Language Model

2408 – Fire‑Flyer AI‑HPC: A Cost‑Effective Software‑Hardware Co‑Design for Deep Learning

2412 – DeepSeek‑V3 Technical Report

Although the author is not a specialist, the article extracts the core ideas from the papers.

Why DeepSeek? The author asks why DeepSeek can achieve top‑rank performance at a fraction of the cost compared with OpenAI, Google, or Microsoft. The answer lies in revisiting the scaling law – the empirical observation that larger models (more parameters and data) tend to perform better – and questioning its universal applicability.

The first paper (2401) introduces the concept of Longtermism (a long‑term perspective on model scaling) and discusses the limitations of closed‑source products that rely on massive compute and annotation budgets.

The second paper (2405) focuses on three technical improvements:

Higher‑quality data sets, even if not dramatically larger.

Grouped Query Attention (GQA) to reduce computational complexity.

Depth‑First Design (DFD) that increases model depth, improving reasoning and code‑generation tasks.

GQA works like a library catalog: grouping queries reduces the amount of work needed to locate information, saving compute and memory.

The paper also mentions SFT and DPO techniques that improve dialogue safety and alignment.

The third paper (2408) is not about new model architecture but about a cost‑effective hardware‑software co‑design (HF Reduce) that lets high‑end software run efficiently on mid‑range GPUs, cutting both cost and energy consumption by about 50%.

The fourth paper (2412) presents DeepSeek‑V3, which builds on the previous innovations and adds two new mechanisms:

Loss‑Free Load Balancing Strategy (LFBS) – dynamically adjusts expert workload without extra loss terms.

Multi‑Token Prediction (MTP) – enables the model to anticipate several future steps, improving overall reasoning.

DeepSeek‑V3 also introduces Multi‑Head Latent Attention (MLA) , described as a “memory‑palace” that stores token representations in labeled “rooms,” allowing efficient retrieval and compression compared with traditional flat KV caches.

Across the series, DeepSeek demonstrates that strong performance does not require the highest compute budget; instead, careful data curation, modular expert routing, and novel attention mechanisms can achieve economical yet powerful LLMs.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

model optimization large language models Mixture of Experts DeepSeek scaling laws Grouped Query Attention

Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.