DataFunTalk
Feb 16, 2025 · Artificial Intelligence
Understanding Reasoning LLMs: DeepSeek R1 Variants, Inference‑Time Scaling, and Training Strategies
This article explains what reasoning language models are, outlines their strengths and weaknesses, details DeepSeek R1's three variants and their training pipelines—including pure reinforcement learning, SFT + RL, and distillation—while also discussing inference‑time scaling techniques and related research such as Sky‑T1 and TinyZero.
DeepSeekModel DistillationSupervised Fine-tuning
0 likes · 16 min read