Tagged articles

17 articles

Page 1 of 1

Apr 8, 2026 · Artificial Intelligence

Can Generative Reasoning Re‑ranking Unlock New Gains for LLM‑Based Recommender Systems?

The article analyzes a recent paper that introduces a generative reasoning re‑ranker for LLM‑driven recommendation, detailing its SFT and RL training pipeline, semantic‑ID embedding, target vs. reject sampling strategies, and experimental gains of 2.4% Recall@5 and 1.3% NDCG@5 over the OneRec‑Think baseline.

Generative ReasoningLLMRe‑ranking

0 likes · 9 min read

Can Generative Reasoning Re‑ranking Unlock New Gains for LLM‑Based Recommender Systems?

Machine Learning Algorithms & Natural Language Processing

Mar 28, 2026 · Artificial Intelligence

A Comprehensive Guide to LLM Post‑Training: From RLHF and GRPO to Agentic RL

This article systematically explains the post‑training pipeline for large language models, covering supervised fine‑tuning, RLHF, PPO, GRPO, RLVR, DPO and emerging Agentic RL, while illustrating each method with analogies, detailed workflows, tables, and recent research findings.

DPOGRPOLLM

0 likes · 24 min read

A Comprehensive Guide to LLM Post‑Training: From RLHF and GRPO to Agentic RL

Baobao Algorithm Notes

Nov 20, 2025 · Artificial Intelligence

Why Reinforcement Learning Preserves LLM Generality Better Than Supervised Fine‑Tuning

The article analyzes why reinforcement learning (RL) fine‑tuning retains a large language model's general abilities better than supervised fine‑tuning (SFT), explaining the off‑policy distribution shift of SFT and the on‑policy data consistency, KL penalty, and trust‑region mechanisms that give RL its anti‑forgetting properties.

Catastrophic ForgettingLLMOn-Policy Data

0 likes · 8 min read

Why Reinforcement Learning Preserves LLM Generality Better Than Supervised Fine‑Tuning

DataFunSummit

Nov 3, 2025 · Artificial Intelligence

How Tencent’s LLM Powers Real‑World AI: From RAG to Agents

This article examines Tencent's large language model applications across diverse business scenarios, detailing core use cases such as content generation, intelligent customer service, and role‑playing, and explains the three key technologies—Supervised Fine‑Tuning, Retrieval‑Augmented Generation, and Agents—that enable these capabilities.

AI applicationsAgentLLM

0 likes · 4 min read

How Tencent’s LLM Powers Real‑World AI: From RAG to Agents

DataFunTalk

Oct 13, 2025 · Artificial Intelligence

How Tencent Uses RAG, GraphRAG, and Agents to Power Large Language Model Applications

This article examines Tencent's large language model deployments across diverse business scenarios, detailing core use cases such as content generation, intelligent customer service, and role‑playing, while explaining the underlying technologies of Supervised Fine‑Tuning, Retrieval‑Augmented Generation, and Agent systems.

AI applicationsAgentLarge Language Model

0 likes · 4 min read

How Tencent Uses RAG, GraphRAG, and Agents to Power Large Language Model Applications

Data Party THU

Oct 1, 2025 · Artificial Intelligence

Why SFT and RL Are Two Sides of the Same Coin: A Unified Gradient Theory for LLM Post‑Training

This article analyzes a recent paper that unifies supervised fine‑tuning (SFT) and reinforcement learning (RL) for large language models under a single gradient estimator, introduces the Unified Policy Gradient Estimator (UPGE) and the Hybrid Post‑Training (HPT) algorithm, and demonstrates their superior performance on math reasoning benchmarks.

AI researchHybrid TrainingLLM

0 likes · 11 min read

Why SFT and RL Are Two Sides of the Same Coin: A Unified Gradient Theory for LLM Post‑Training

Baobao Algorithm Notes

Aug 14, 2025 · Artificial Intelligence

Why Standard SFT Fails to Generalize and How One‑Line Dynamic Fine‑Tuning Fixes It

The article analyzes the poor generalization of supervised fine‑tuning (SFT) for large language models, reveals its gradient as a high‑variance inverse‑probability policy gradient, proposes a one‑line Dynamic Fine‑Tuning correction, and shows substantial gains on challenging math and offline RL benchmarks.

Dynamic Fine-TuningGeneralizationLLM alignment

0 likes · 7 min read

Why Standard SFT Fails to Generalize and How One‑Line Dynamic Fine‑Tuning Fixes It

Volcano Engine Developer Services

May 22, 2025 · Artificial Intelligence

How LLMs Can Automate Ticket Escalation: Inside ByteBrain’s TickIt System

This article introduces TickIt, a ByteBrain system that leverages large language models to automatically identify and escalate critical Oncall tickets, detailing its multi‑class escalation, deduplication, and category‑guided fine‑tuning modules, experimental results, and the operational impact on cloud services.

Incident ManagementLLMOncall analysis

0 likes · 13 min read

How LLMs Can Automate Ticket Escalation: Inside ByteBrain’s TickIt System

AIWalker

May 6, 2025 · Artificial Intelligence

SimpleAR: High‑Quality 1024×1024 Images with Just 0.5B Parameters via Pretraining, SFT, and RL

SimpleAR demonstrates that a vanilla autoregressive model with only 0.5 B parameters can generate high‑fidelity 1024×1024 images, covering pretraining, supervised fine‑tuning, and reinforcement learning, achieving competitive GenEval (0.59) and DPG‑Bench (79.66) scores while reducing inference time to about 14 seconds with vLLM and KV‑cache optimizations.

Supervised Fine‑Tuningautoregressivebenchmark

0 likes · 14 min read

SimpleAR: High‑Quality 1024×1024 Images with Just 0.5B Parameters via Pretraining, SFT, and RL

AI Algorithm Path

Apr 2, 2025 · Artificial Intelligence

Master the Three Essential LLM Training Stages for 2025

The article breaks down the three core stages of large‑language‑model training—pre‑training, supervised fine‑tuning, and RLHF—explaining their purpose, methods, and concrete examples while noting DeepSeek‑R1’s recent breakthrough and its implications for AI development.

AI trainingDeepSeekLLM

0 likes · 5 min read

Master the Three Essential LLM Training Stages for 2025

Architect

Mar 16, 2025 · Artificial Intelligence

Training a 0.5B LLM with Chain‑of‑Thought Reasoning: From Pre‑training to GRPO Fine‑tuning

This article walks through the complete lifecycle of building a small large‑language model, covering token‑level inference, pre‑training, post‑training steps such as supervised fine‑tuning, reward‑model creation, and reinforcement‑learning methods like DPO, PPO and GRPO, culminating in a practical 0.5B model fine‑tuned for chain‑of‑thought reasoning.

GRPOLLM trainingReward Modeling

0 likes · 22 min read

Training a 0.5B LLM with Chain‑of‑Thought Reasoning: From Pre‑training to GRPO Fine‑tuning

DataFunTalk

Mar 9, 2025 · Artificial Intelligence

Critique Fine-Tuning (CFT): Boosting Large Language Model Reasoning with Minimal Data

The paper introduces Critique Fine-Tuning (CFT), a method that replaces simple imitation in supervised fine‑tuning with critique‑based learning, achieving superior reasoning performance on mathematical benchmarks using only 50 K samples, outperforming traditional reinforcement‑learning approaches that require millions of examples.

AI reasoningCritique Fine-TuningMathematical Benchmarks

0 likes · 7 min read

Critique Fine-Tuning (CFT): Boosting Large Language Model Reasoning with Minimal Data

Architecture Digest

Feb 7, 2025 · Artificial Intelligence

Open-Source Replication of OpenAI’s o1 Model Achieves Superior Performance with Minimal Cost

A recent study by Fei‑Fei Li’s team shows that using supervised fine‑tuning on the open‑source Qwen2.5‑32B‑Instruct model can replicate and even surpass the reasoning abilities of OpenAI’s o1‑preview at a fraction of the computational cost, demonstrating a cheap yet powerful approach to large‑language‑model development.

Supervised Fine‑Tuningbudget-forcingcost-effective-ai

0 likes · 6 min read

Open-Source Replication of OpenAI’s o1 Model Achieves Superior Performance with Minimal Cost

Bilibili Tech

Nov 5, 2024 · Artificial Intelligence

Bilibili's In-House Role-Playing Large Language Model: Architecture, Training Stages, Evaluation, and Demonstrations

Bilibili’s in‑house role‑playing large language model, built on the Index architecture and refined through pre‑training, supervised fine‑tuning, and preference optimization (PPO and DPO), achieved top scores on the Chinese CharacterEval benchmark, surpassing rivals while incorporating safety alignment and showcasing consistent, personality‑driven dialogue examples.

Content SafetyPreference OptimizationSupervised Fine‑Tuning

0 likes · 13 min read

Bilibili's In-House Role-Playing Large Language Model: Architecture, Training Stages, Evaluation, and Demonstrations

NewBeeNLP

Mar 27, 2024 · Artificial Intelligence

Deep Dive into Llama 2: Architecture, Pre‑training, SFT, and Safety Insights

This article provides a comprehensive technical overview of Meta's Llama 2 series, covering its architectural upgrades such as Group Query Attention, the pre‑training dataset and hyper‑parameters, loss behavior, benchmark comparisons, and the supervised fine‑tuning pipeline with safety considerations.

AILlama-2Model architecture

0 likes · 11 min read

Deep Dive into Llama 2: Architecture, Pre‑training, SFT, and Safety Insights

Rare Earth Juejin Tech Community

Dec 24, 2023 · Artificial Intelligence

Llama 2: Open Foundation and Fine‑Tuned Chat Models – Overview, Training, and RLHF Details

This article provides a comprehensive English overview of Meta's Llama 2 family, describing the model sizes, pre‑training data, architectural improvements, supervised fine‑tuning, reinforcement learning with human feedback, safety evaluations, reward‑model training, and iterative optimization techniques used to produce the high‑performing Llama 2‑Chat models.

Llama-2Open‑sourceRLHF

0 likes · 33 min read

Llama 2: Open Foundation and Fine‑Tuned Chat Models – Overview, Training, and RLHF Details

DataFunSummit

Feb 10, 2023 · Artificial Intelligence

Why ChatGPT Shows Strong General Intelligence: Insights from Andrew Ng’s DeepLearning.AI Article

The article explains how techniques such as Reinforcement Learning from Human Feedback, Instruction Fine‑Tuning, Supervised Fine‑tuning and Chain‑of‑Thought contribute to ChatGPT’s impressive general‑intelligence performance, as analyzed by DeepLearning.AI founder Andrew Ng.

Artificial IntelligenceChatGPTDeepLearning.AI

0 likes · 2 min read

Why ChatGPT Shows Strong General Intelligence: Insights from Andrew Ng’s DeepLearning.AI Article