Tagged articles

8 articles

Page 1 of 1

Apr 27, 2026 · Artificial Intelligence

ACL 2026: Unveiling a Predictive Scaling Law for Reinforcement Learning Fine‑Tuning of Large Models

The paper presents a systematic empirical study that derives a power‑law scaling formula for reinforcement‑learning‑after‑training of large language models, demonstrating accurate inter‑ and intra‑model performance prediction, learning‑efficiency saturation, data‑reuse benefits, and cross‑architecture validity.

Data ReuseLlama 3Qwen2.5

0 likes · 11 min read

ACL 2026: Unveiling a Predictive Scaling Law for Reinforcement Learning Fine‑Tuning of Large Models

PaperAgent

Feb 7, 2026 · Artificial Intelligence

Can 13 Parameters Match Full‑Scale Fine‑Tuning? TinyLoRA’s RL Breakthrough

TinyLoRA, a Meta‑proposed method that fine‑tunes Qwen2.5‑7B with only 13 trainable parameters (26 bytes), achieves 91% accuracy on GSM8K under reinforcement learning, revealing that ultra‑low‑parameter RL can rival full‑scale supervised fine‑tuning.

GSM8KQwen2.5TinyLoRA

0 likes · 7 min read

Can 13 Parameters Match Full‑Scale Fine‑Tuning? TinyLoRA’s RL Breakthrough

Fun with Large Models

Jun 12, 2025 · Artificial Intelligence

Implement GRPO to Give LLMs Reasoning Ability with Qwen2.5‑0.5B

This article explains the GRPO reinforcement‑learning algorithm, shows its core idea of internal group competition without a separate evaluator model, and provides a complete, step‑by‑step code walkthrough—including environment setup, dataset preparation, reward‑function design, training configuration, and evaluation—using the Qwen2.5‑0.5B‑Instruct model on the GSM8K math dataset.

GRPOGSM8KQwen2.5

0 likes · 23 min read

Implement GRPO to Give LLMs Reasoning Ability with Qwen2.5‑0.5B

Alibaba Cloud Big Data AI Platform

Mar 12, 2025 · Artificial Intelligence

Deploy, Fine‑Tune, and Compress DistilQwen2.5 on Alibaba Cloud PAI – A Complete Guide

This article walks through the full workflow for using Alibaba Cloud's open‑source DistilQwen2.5 models on the PAI platform, covering environment setup, model deployment, fine‑tuning with SFT and DPO, evaluation, and model compression for resource‑constrained scenarios.

DistilQwen2.5Large Language ModelPAI

0 likes · 13 min read

Deploy, Fine‑Tune, and Compress DistilQwen2.5 on Alibaba Cloud PAI – A Complete Guide

Big Data Technology Architecture

Feb 9, 2025 · Artificial Intelligence

Reproducing Deepseek RI Reasoning Ability with GRPO on Qwen2.5‑7B in Colab

This article explains how to replicate Deepseek RI's slow‑thinking inference using the GRPO reinforcement‑learning algorithm on the Qwen2.5‑7B model in a free Colab notebook, covering the underlying COT concept, reward‑function design, data preparation, training configuration, and observed results.

GRPOLLMPython

0 likes · 14 min read

Reproducing Deepseek RI Reasoning Ability with GRPO on Qwen2.5‑7B in Colab

Baobao Algorithm Notes

Jan 8, 2025 · Artificial Intelligence

Inside Llama 3.1, DeepSeek‑V3, TÜLU 3 & Qwen 2.5: A Deep Dive into Post‑Training Techniques

This article compiles and analyzes the post‑training pipelines of Llama 3.1, DeepSeek‑V3, TÜLU 3 and Qwen 2.5, detailing their data compositions, SFT, reward modeling, DPO, GRPO, RLVR methods, hyper‑parameters, and practical tricks for large‑language‑model alignment.

DPODeepSeek-V3Llama3.1

0 likes · 22 min read

Inside Llama 3.1, DeepSeek‑V3, TÜLU 3 & Qwen 2.5: A Deep Dive into Post‑Training Techniques

Alibaba Cloud Native

Dec 26, 2024 · Cloud Computing

Deploy Qwen2.5 LLM on Alibaba Cloud Function Compute: A Step‑by‑Step Guide

This guide explains how to deploy the Qwen2.5 large language model on Alibaba Cloud Function Compute using Ollama and Open WebUI, covering model selection, resource configuration, deployment steps, interface setup, multilingual capabilities, and automatic scaling for high‑concurrency workloads.

AI model deploymentCloud ComputingFunction Compute

0 likes · 10 min read

Deploy Qwen2.5 LLM on Alibaba Cloud Function Compute: A Step‑by‑Step Guide

NewBeeNLP

Dec 23, 2024 · Artificial Intelligence

What’s New in Qwen2.5? A Deep Dive into the Latest LLM Advances

The Qwen2.5 Technical Report introduces a new series of large language models with up to 72 B parameters, expanded pre‑training data to 18 trillion tokens, advanced supervised fine‑tuning and reinforcement learning pipelines, and demonstrates strong performance across comprehension, reasoning, coding, and long‑context tasks.

LLMLarge Language ModelQwen2.5

0 likes · 5 min read

What’s New in Qwen2.5? A Deep Dive into the Latest LLM Advances