Author

Baobao Algorithm Notes

Author of the BaiMian large model, offering technology and industry insights.

294

Articles

Likes

144

Views

Comments

Latest from Baobao Algorithm Notes

100 recent articles max

Baobao Algorithm Notes

Apr 20, 2025 · Artificial Intelligence

Can Agentic RL Transform LLM Training? A Deep Dive into VeRL and Search‑R1

This article explores the emerging concept of agentic reinforcement learning for large language models, analyzes ByteDance's VeRL and the Search‑R1 frameworks, identifies practical challenges in tool integration and environment parallelism, and proposes a unified, Ray‑based architecture to enable scalable, high‑quality RL environments.

Rayenvironment designsearch-r1

0 likes · 11 min read

Can Agentic RL Transform LLM Training? A Deep Dive into VeRL and Search‑R1

Baobao Algorithm Notes

Apr 16, 2025 · Artificial Intelligence

Why Reinforcement Learning Finally Works: The Second Half of AI

The article argues that AI has entered its second half, where reinforcement learning finally generalizes thanks to large‑scale language pretraining and reasoning, shifting focus from building ever better models to redefining problems, evaluation methods, and real‑world utility.

AI researchindustry trends

0 likes · 16 min read

Why Reinforcement Learning Finally Works: The Second Half of AI

Baobao Algorithm Notes

Apr 15, 2025 · Industry Insights

Why GLM‑Z1‑AirX Hits 150‑200 TPS: A Deep Dive into LLM Speed Benchmarking

The article examines the slowdown caused by long‑chain‑of‑thought LLMs, presents a Python benchmarking script, compares token‑per‑second performance of several models—including the ultra‑fast GLM‑Z1‑AirX—and demonstrates a real‑time anti‑fraud use case that benefits from sub‑second response times.

GLM-Z1-AirXLLMPerformance

0 likes · 13 min read

Why GLM‑Z1‑AirX Hits 150‑200 TPS: A Deep Dive into LLM Speed Benchmarking

Baobao Algorithm Notes

Apr 6, 2025 · Artificial Intelligence

Inside Llama 4: How Meta’s New Multimodal MoE Models Achieve 10M‑Token Contexts

Meta unveils Llama 4 Scout, Maverick, and the upcoming Behemoth, detailing their Mixture‑of‑Experts architecture, massive 10‑million‑token context windows, efficient FP8 training, safety mechanisms, and competitive benchmark results that surpass leading multimodal models.

AI safetyLlama 4Mixture of Experts

0 likes · 16 min read

Inside Llama 4: How Meta’s New Multimodal MoE Models Achieve 10M‑Token Contexts

Baobao Algorithm Notes

Apr 2, 2025 · Industry Insights

Building AI‑Native Teams: Turning AI Agents into Reliable Digital Employees

This article analyses why current AI agents fall short of being true digital employees, identifies four major obstacles—undocumented knowledge, GUI‑only tools, lack of isolated test environments, and limited memory and initiative—and proposes a comprehensive, six‑step technical and cultural roadmap for creating AI‑native teams that treat AI as a collaborative team member.

AI integrationDigital EmployeeOperations

0 likes · 61 min read

Building AI‑Native Teams: Turning AI Agents into Reliable Digital Employees

Baobao Algorithm Notes

Mar 30, 2025 · Artificial Intelligence

Why Scaling, Data, and Infra Matter More Than Reward Design in R1 Replication

The article analyses two months of community attempts to reproduce DeepSeek R1, highlighting that model scaling, high‑quality data, robust training infrastructure, and careful hyper‑parameter tuning outweigh pure reward‑based tricks, and it outlines common pitfalls and future research directions.

DeepSeekInfrastructureLLM

0 likes · 13 min read

Why Scaling, Data, and Infra Matter More Than Reward Design in R1 Replication

Baobao Algorithm Notes

Mar 28, 2025 · Artificial Intelligence

Can Small 7B Models Beat the State‑of‑the‑Art? A Critical Analysis of R1‑Zero Training and Unbiased GRPO

This article critically examines R1‑Zero‑style training by analyzing foundation models and reinforcement learning, uncovering pre‑training and optimization biases, proposing an unbiased Dr. GRPO method, and demonstrating a minimalist 7B‑model recipe that achieves new state‑of‑the‑art performance on AIME 2024.

GRPOLLM evaluationR1-Zero

0 likes · 20 min read

Can Small 7B Models Beat the State‑of‑the‑Art? A Critical Analysis of R1‑Zero Training and Unbiased GRPO

Baobao Algorithm Notes

Mar 27, 2025 · Artificial Intelligence

Why a Robust Training Pipeline Beats Fancy LLM Tricks – Lessons from DAPO

The article analyzes the DAPO technical report, showing how dynamic‑sampling pipelines and token‑level loss handling in SFT and RL training outperform ad‑hoc algorithm tricks, and compares the training dynamics of reinforce_baseline and GRPO with concrete code examples.

Dynamic SamplingGRPOLLM

0 likes · 8 min read

Why a Robust Training Pipeline Beats Fancy LLM Tricks – Lessons from DAPO

Baobao Algorithm Notes

Mar 23, 2025 · Artificial Intelligence

Why Future AI Agents Must Evolve Beyond Prompt‑Driven Workflows

The article argues that the next generation of AI agents should focus on improving the model itself through reinforcement learning and reasoning rather than relying on pre‑designed prompt‑driven workflows, highlighting industry trends, technical challenges, and the shift toward treating models as products.

DeepSearchLLMmodel as product

0 likes · 29 min read

Why Future AI Agents Must Evolve Beyond Prompt‑Driven Workflows

Baobao Algorithm Notes

Mar 21, 2025 · Artificial Intelligence

Unlocking LLM Reasoning: A Deep Dive into Post‑Training Techniques

This article provides a comprehensive technical overview of large language model post‑training, covering fine‑tuning methods (full, parameter‑efficient, LoRA families, prompt tuning), domain‑adaptive tuning, reinforcement‑learning reward modeling, process vs. outcome rewards, inference‑enhancement strategies, dynamic compute allocation, verifier‑augmented reasoning, current challenges, and emerging research directions such as meta‑cognition, physical reasoning, and swarm intelligence.

LLMmeta-cognitionpost-training

0 likes · 21 min read

Unlocking LLM Reasoning: A Deep Dive into Post‑Training Techniques