Tag

LLM reasoning

0 views collected around this technical thread.

Tencent Technical Engineering
Tencent Technical Engineering
Feb 21, 2025 · Artificial Intelligence

DeepSeek-R1: Enhancing Reasoning Capabilities in LLMs via Reinforcement Learning

DeepSeek‑R1 demonstrates that large‑scale reinforcement learning, especially with the novel Group Relative Policy Optimization and a rule‑based reward scheme, can markedly boost reasoning in LLMs without heavy supervised fine‑tuning, while a brief cold‑start SFT phase, two‑stage alignment, and knowledge distillation further improve performance and efficiency, despite remaining challenges such as language mixing.

Cold StartDeepSeek-R1GRPO
0 likes · 21 min read
DeepSeek-R1: Enhancing Reasoning Capabilities in LLMs via Reinforcement Learning
Architect
Architect
Feb 6, 2025 · Artificial Intelligence

DeepSeek‑R1: Reinforcement‑Learning‑Driven Long‑Chain Reasoning for Large Language Models

The article reviews DeepSeek‑R1, detailing its reinforcement‑learning‑based training pipeline that uses minimal supervised data, cold‑start fine‑tuning, multi‑stage RL, rejection‑sampling SFT, and distillation to achieve reasoning performance comparable to OpenAI‑o1‑1217, while also discussing successful contributions and failed experiments.

AI researchDeepSeek-R1LLM reasoning
0 likes · 11 min read
DeepSeek‑R1: Reinforcement‑Learning‑Driven Long‑Chain Reasoning for Large Language Models