Baobao Algorithm Notes
Author

Baobao Algorithm Notes

Author of the BaiMian large model, offering technology and industry insights.

294
Articles
0
Likes
144
Views
0
Comments
Recent Articles

Latest from Baobao Algorithm Notes

100 recent articles max
Baobao Algorithm Notes
Baobao Algorithm Notes
Jul 2, 2025 · Industry Insights

Why Baidu’s Open‑Source Ernie 4.5 Could Redefine the Global AI Race

Baidu has open‑sourced ten Ernie 4.5 models ranging from 0.3B to 424B parameters, featuring multimodal MoE pre‑training, advanced infrastructure, and post‑training tricks that deliver benchmark results surpassing DeepSeek‑V3 and OpenAI‑o1, sparking worldwide industry attention and reshaping AI competition.

AI competitionBaiduErnie
0 likes · 8 min read
Why Baidu’s Open‑Source Ernie 4.5 Could Redefine the Global AI Race
Baobao Algorithm Notes
Baobao Algorithm Notes
Jun 30, 2025 · Artificial Intelligence

How End‑to‑End Reinforcement Learning Powers the Kimi‑Researcher AI Agent

The article examines Kimi‑Researcher, an AI research agent built with end‑to‑end reinforcement learning, detailing its technical motivations, advantages over traditional workflow‑based and SFT methods, performance breakthroughs on benchmark exams, and diverse real‑world use cases ranging from literature reviews to legal analysis.

AI agentEnd-to-End RLKimi Researcher
0 likes · 10 min read
How End‑to‑End Reinforcement Learning Powers the Kimi‑Researcher AI Agent
Baobao Algorithm Notes
Baobao Algorithm Notes
Jun 13, 2025 · Artificial Intelligence

How GVPO Improves LLM Fine‑Tuning: Stable, Sample‑Rich Policy Optimization

The article introduces GVPO, a Group Variance Policy Optimization method that uniquely achieves KL‑constrained reward maximization, supports diverse sampling distributions, and resolves instability and inefficiency issues found in GRPO and traditional policy‑gradient approaches for large language model post‑training.

GVPOKL constraintPolicy Optimization
0 likes · 9 min read
How GVPO Improves LLM Fine‑Tuning: Stable, Sample‑Rich Policy Optimization
Baobao Algorithm Notes
Baobao Algorithm Notes
Jun 6, 2025 · Artificial Intelligence

What AI Programming Agents Reveal About RL, Feedback Loops, and Long‑Context Challenges

In a deep dive into the Cursor team's podcast, core members dissect the current hurdles of AI programming agents, covering feedback‑mechanism design, reinforcement‑learning reward sparsity, tool‑chain integration, long‑context handling, and emerging attention mechanisms that shape the future of code‑centric AI.

AI programmingattention mechanismslong-context
0 likes · 35 min read
What AI Programming Agents Reveal About RL, Feedback Loops, and Long‑Context Challenges
Baobao Algorithm Notes
Baobao Algorithm Notes
Jun 3, 2025 · Artificial Intelligence

Can 1K Fine‑Tuning Replace 100K RL Steps? Insights from Re‑distillation Research

An extensive analysis shows that a 1K‑sample fine‑tuning stage can replicate the generalization gains of thousands of reinforcement‑learning steps, explains the compressibility of RL, introduces a sample‑effect theory, and demonstrates that re‑distillation and small‑scale SFT dramatically improve LLM performance.

Re-distillationSample Effectlarge language models
0 likes · 23 min read
Can 1K Fine‑Tuning Replace 100K RL Steps? Insights from Re‑distillation Research
Baobao Algorithm Notes
Baobao Algorithm Notes
Jun 3, 2025 · Artificial Intelligence

How to Train a 671B‑Scale Model with RL: Insights from a verl Internship

This article shares a detailed, first‑hand analysis of the technical challenges, framework choices, memory management, weight conversion, precision alignment, and efficiency optimizations encountered while building reinforcement‑learning pipelines for a 671‑billion‑parameter model using the verl ecosystem.

GPU Memory ManagementLarge ModelsMegatron
0 likes · 16 min read
How to Train a 671B‑Scale Model with RL: Insights from a verl Internship