Tagged articles
8 articles
Page 1 of 1
Machine Heart
Machine Heart
Apr 27, 2026 · Artificial Intelligence

ACL 2026: Unveiling a Predictive Scaling Law for Reinforcement Learning Fine‑Tuning of Large Models

The paper presents a systematic empirical study that derives a power‑law scaling formula for reinforcement‑learning‑after‑training of large language models, demonstrating accurate inter‑ and intra‑model performance prediction, learning‑efficiency saturation, data‑reuse benefits, and cross‑architecture validity.

Data ReuseLlama 3Qwen2.5
0 likes · 11 min read
ACL 2026: Unveiling a Predictive Scaling Law for Reinforcement Learning Fine‑Tuning of Large Models
Fun with Large Models
Fun with Large Models
Jun 12, 2025 · Artificial Intelligence

Implement GRPO to Give LLMs Reasoning Ability with Qwen2.5‑0.5B

This article explains the GRPO reinforcement‑learning algorithm, shows its core idea of internal group competition without a separate evaluator model, and provides a complete, step‑by‑step code walkthrough—including environment setup, dataset preparation, reward‑function design, training configuration, and evaluation—using the Qwen2.5‑0.5B‑Instruct model on the GSM8K math dataset.

GRPOGSM8KQwen2.5
0 likes · 23 min read
Implement GRPO to Give LLMs Reasoning Ability with Qwen2.5‑0.5B
Alibaba Cloud Native
Alibaba Cloud Native
Dec 26, 2024 · Cloud Computing

Deploy Qwen2.5 LLM on Alibaba Cloud Function Compute: A Step‑by‑Step Guide

This guide explains how to deploy the Qwen2.5 large language model on Alibaba Cloud Function Compute using Ollama and Open WebUI, covering model selection, resource configuration, deployment steps, interface setup, multilingual capabilities, and automatic scaling for high‑concurrency workloads.

AI model deploymentCloud ComputingFunction Compute
0 likes · 10 min read
Deploy Qwen2.5 LLM on Alibaba Cloud Function Compute: A Step‑by‑Step Guide
NewBeeNLP
NewBeeNLP
Dec 23, 2024 · Artificial Intelligence

What’s New in Qwen2.5? A Deep Dive into the Latest LLM Advances

The Qwen2.5 Technical Report introduces a new series of large language models with up to 72 B parameters, expanded pre‑training data to 18 trillion tokens, advanced supervised fine‑tuning and reinforcement learning pipelines, and demonstrates strong performance across comprehension, reasoning, coding, and long‑context tasks.

LLMLarge Language ModelQwen2.5
0 likes · 5 min read
What’s New in Qwen2.5? A Deep Dive into the Latest LLM Advances