Baobao Algorithm Notes
Author

Baobao Algorithm Notes

Author of the BaiMian large model, offering technology and industry insights.

294
Articles
0
Likes
144
Views
0
Comments
Recent Articles

Latest from Baobao Algorithm Notes

100 recent articles max
Baobao Algorithm Notes
Baobao Algorithm Notes
Nov 18, 2025 · Industry Insights

Google Gemini 3 Pro Beats GPT‑5.1 on Top AGI Benchmarks – What the Results Reveal

Google's Gemini 3 and Gemini 3 Pro launch topped major AGI benchmarks such as The Human Last Exam and ARC‑AGI‑2, outperformed GPT‑5.1 in a complex 3‑D gear visualization task, and even generated a functional cloud‑OS prototype, signaling a notable shift toward true artificial general intelligence.

AGI benchmarksAI competitionGoogle Gemini
0 likes · 5 min read
Google Gemini 3 Pro Beats GPT‑5.1 on Top AGI Benchmarks – What the Results Reveal
Baobao Algorithm Notes
Baobao Algorithm Notes
Nov 13, 2025 · Artificial Intelligence

Introducing UNO‑Bench: The First Unified Omni‑Modal LLM Evaluation Suite

UNO‑Bench, an open‑source benchmark from Meituan’s LongCat team, provides the first high‑quality, low‑redundancy unified evaluation framework for omni‑modal large language models, featuring 1,250 manually annotated cross‑modal samples and 2,480 enhanced single‑modal samples covering 44 fine‑grained tasks and five modality combinations.

AI Scaling Lawbenchmarkdata pipeline
0 likes · 15 min read
Introducing UNO‑Bench: The First Unified Omni‑Modal LLM Evaluation Suite
Baobao Algorithm Notes
Baobao Algorithm Notes
Nov 11, 2025 · Artificial Intelligence

Why Redesign the Training Stack? Inside Olmo‑Thinking’s Open‑Source RL Journey

This article provides a detailed technical analysis of the Olmo‑Thinking project, covering why a new open‑source LLM was built, the challenges of reinforcement learning at scale, data‑mix optimization, architectural bottlenecks such as missing GQA and QK‑Norm, and the post‑training techniques used to improve reasoning and long‑context capabilities.

Open-source modelsRLVRdata selection
0 likes · 20 min read
Why Redesign the Training Stack? Inside Olmo‑Thinking’s Open‑Source RL Journey
Baobao Algorithm Notes
Baobao Algorithm Notes
Nov 7, 2025 · Artificial Intelligence

Kimi K2-Thinking: 1T‑Parameter Agent Model Beats GPT‑5 on Humanity’s Last Exam

Kimi's open‑source K2‑Thinking model, a 1‑trillion‑parameter agent with native INT4 quantization and 256k context, achieves SOTA performance on benchmarks like Humanity’s Last Exam, BrowseComp and SEAL‑0, outperforms GPT‑5 and Grok‑4, and demonstrates complex tool‑driven reasoning through real‑world examples.

AIAgent ModelK2-Thinking
0 likes · 6 min read
Kimi K2-Thinking: 1T‑Parameter Agent Model Beats GPT‑5 on Humanity’s Last Exam
Baobao Algorithm Notes
Baobao Algorithm Notes
Oct 31, 2025 · Artificial Intelligence

How Risk‑Sensitive Reinforcement Learning Improves LLM Pass@K Performance

This article analyzes why standard reinforcement learning can degrade Pass@K metrics after fine‑tuning large language models, introduces a risk‑sensitive RL objective that reshapes the advantage estimator, and demonstrates through bandit and mathematical‑reasoning experiments that the RS‑GRPO method consistently boosts diversity and overall Pass@K scores across multiple LLMs.

Exploration-ExploitationLLM fine-tuningPolicy Gradient
0 likes · 12 min read
How Risk‑Sensitive Reinforcement Learning Improves LLM Pass@K Performance
Baobao Algorithm Notes
Baobao Algorithm Notes
Oct 31, 2025 · Artificial Intelligence

Unlocking LLM RL Scaling: The Best Practices from Meta’s New Study

Meta’s recent paper reveals a sigmoid‑shaped scaling law for LLM reinforcement learning, presents extensive 40‑k GPU‑hour experiments, compares various RL designs such as PPO‑off‑policy‑k and Pipeline‑RL‑k, and distills the findings into a practical “ScaleRL” recipe that improves performance and efficiency.

LLMRL OptimizationScaling Law
0 likes · 10 min read
Unlocking LLM RL Scaling: The Best Practices from Meta’s New Study
Baobao Algorithm Notes
Baobao Algorithm Notes
Oct 30, 2025 · Artificial Intelligence

Why LLM RL Training Crashes While SFT Stays Stable: Insights & Tricks

The article examines the fundamental similarity between SFT and RL loss functions for large language models, explains why RL training is prone to instability, discusses infrastructure and data quality challenges, and reviews practical tricks and reward‑model considerations for more reliable RL fine‑tuning.

AILLMReward Modeling
0 likes · 11 min read
Why LLM RL Training Crashes While SFT Stays Stable: Insights & Tricks
Baobao Algorithm Notes
Baobao Algorithm Notes
Oct 28, 2025 · Artificial Intelligence

Why Entropy Collapse Limits LLM Reinforcement Learning and How to Fix It

The article explains how information entropy, cross‑entropy, and KL‑divergence shape reinforcement learning for large language models, describes the phenomenon of entropy collapse, compares token‑level and policy‑level entropy, and reviews recent methods like Clip‑Cov and KL‑Cov that mitigate this issue.

cross entropyentropypolicy entropy
0 likes · 11 min read
Why Entropy Collapse Limits LLM Reinforcement Learning and How to Fix It