Tag

FP8 Training

1 views collected around this technical thread.

Tencent Cloud Developer
Tencent Cloud Developer
Mar 5, 2025 · Artificial Intelligence

DeepSeek Series Overview: Core Technologies, Model Innovations, and Product Highlights

The article delivers a PPT‑style deep dive into the DeepSeek series—from the original LLM through DeepSeek‑MoE, Math, V2, V3 and R1—highlighting core innovations such as Multi‑Head Latent Attention, fine‑grained MoE, GRPO reinforcement learning, Multi‑Token Prediction, DualPipe parallelism and FP8 training that together achieve high performance at a fraction of traditional costs, and notes their integration into Tencent’s OlaChat intelligent assistant.

AIDeepSeekFP8 Training
0 likes · 21 min read
DeepSeek Series Overview: Core Technologies, Model Innovations, and Product Highlights
Architect
Architect
Feb 16, 2025 · Artificial Intelligence

DeepSeek-V3, DeepSeek-R1, and Janus‑Pro: Architecture, Training Techniques, and Performance Insights

This article provides an in‑depth technical overview of DeepSeek‑V3, DeepSeek‑R1 and Janus‑Pro models, covering their Mixture‑of‑Experts architecture, novel MLA attention, auxiliary‑loss‑free load balancing, multi‑token prediction, FP8 mixed‑precision training, efficient cross‑node communication, reinforcement‑learning pipelines, multimodal modeling strategies, performance comparisons, cost statistics, and current limitations.

AI architectureDeepSeek-V3FP8 Training
0 likes · 18 min read
DeepSeek-V3, DeepSeek-R1, and Janus‑Pro: Architecture, Training Techniques, and Performance Insights
IT Architects Alliance
IT Architects Alliance
Feb 15, 2025 · Artificial Intelligence

DeepSeek: Architecture, Core Technologies, Training Strategies, and Comparative Analysis

The article provides an in‑depth overview of DeepSeek's transformer‑based foundation, Mixture‑of‑Experts architecture, novel attention mechanisms, multi‑token prediction, FP8 mixed‑precision training, knowledge distillation, reinforcement‑learning approaches, and compares its performance and cost advantages against leading models such as GPT and Gemini.

AI Model ArchitectureDeepSeekFP8 Training
0 likes · 29 min read
DeepSeek: Architecture, Core Technologies, Training Strategies, and Comparative Analysis