Tag

Qwen2.5

1 views collected around this technical thread.

Big Data Technology Architecture
Big Data Technology Architecture
Feb 9, 2025 · Artificial Intelligence

Reproducing Deepseek RI Reasoning Ability with GRPO on Qwen2.5‑7B in Colab

This article explains how to replicate Deepseek RI's slow‑thinking inference using the GRPO reinforcement‑learning algorithm on the Qwen2.5‑7B model in a free Colab notebook, covering the underlying COT concept, reward‑function design, data preparation, training configuration, and observed results.

DeepSeekFine-tuningGRPO
0 likes · 14 min read
Reproducing Deepseek RI Reasoning Ability with GRPO on Qwen2.5‑7B in Colab