Author

Baobao Algorithm Notes

Author of the BaiMian large model, offering technology and industry insights.

294

Articles

Likes

144

Views

Comments

Latest from Baobao Algorithm Notes

100 recent articles max

Baobao Algorithm Notes

May 26, 2026 · Artificial Intelligence

How On-Policy Distillation (OPD) Solves Core Challenges in Large-Model Post-Training

The article explains how On-Policy Distillation (OPD) combines on‑policy sampling with dense teacher feedback via reverse KL to address low signal density, distribution shift, and capability interference in large‑model post‑training, and compares implementations by Qwen3, GLM‑5, MiMo‑V2 and DeepSeek‑V4.

Knowledge DistillationModel CompressionOPD

0 likes · 20 min read

How On-Policy Distillation (OPD) Solves Core Challenges in Large-Model Post-Training

Baobao Algorithm Notes

May 22, 2026 · Artificial Intelligence

How LiteScale Cuts Wait Times in Large‑Model Post‑Training with Gradient Accumulation

The article examines the bottleneck of synchronous rollout in large‑model post‑training, proposes an asynchronous design using gradient accumulation and a global micro‑batch count to preserve loss equivalence, and introduces LogitsExpress for efficient top‑K knowledge‑distillation communication, all implemented in the lightweight LiteScale framework.

Knowledge Distillationasynchronous rolloutdistributed training

0 likes · 16 min read

How LiteScale Cuts Wait Times in Large‑Model Post‑Training with Gradient Accumulation

Baobao Algorithm Notes

Apr 27, 2026 · Artificial Intelligence

DeepDive into DeepSeek‑V4: Efficient Million‑Token Context, Hybrid Attention, and Muon Optimizer

The article provides an in‑depth technical analysis of DeepSeek‑V4, detailing its novel hybrid attention architecture (CSA and HCA), the manifold‑constrained hyper‑connection (mHC), massive KV‑cache reductions, FLOPs savings across token lengths, and the Muon optimizer with Newton‑Schulz orthogonalization, all backed by concrete benchmark tables and code snippets.

DeepSeekEfficient AttentionKV cache reduction

0 likes · 61 min read

DeepDive into DeepSeek‑V4: Efficient Million‑Token Context, Hybrid Attention, and Muon Optimizer

Baobao Algorithm Notes

Apr 20, 2026 · Industry Insights

From Prompt Writer to Harness Architect: Redefining the Algorithm Engineer in the LLM Era

The article analyzes how the rise of foundation models shifts algorithm engineers from hand‑crafting models to building robust Harness environments, detailing OpenAI’s agent‑first experiments, the new "Model + Harness" formula, and practical steps for staying valuable in a prompt‑centric world.

AI EngineeringHarness architectureLLM

0 likes · 9 min read

From Prompt Writer to Harness Architect: Redefining the Algorithm Engineer in the LLM Era

Baobao Algorithm Notes

Apr 14, 2026 · Industry Insights

Why Mastering AI Agents Is the Most Critical Skill Right Now

The article argues that leveraging AI agents like Claude Code is now the top priority for developers, explaining how agents boost productivity, the importance of their operating environment, and why embracing them is essential for future success in the AI-driven workplace.

Claude CodeEnvironmentLLM

0 likes · 10 min read

Why Mastering AI Agents Is the Most Critical Skill Right Now

Baobao Algorithm Notes

Mar 20, 2026 · Artificial Intelligence

Can AI Self‑Iterate? Inside MiniMax M2.7’s Self‑Improving Magic

The article examines MiniMax M2.7’s claim of self‑iteration, its impressive Kaggle record, and a series of technical tests—including code refactoring, real‑time chart generation, futures backtesting, business analysis, PPT creation, and news tracking—to evaluate the model’s practical AI self‑evolution capabilities.

AIAutoMLKaggle

0 likes · 8 min read

Can AI Self‑Iterate? Inside MiniMax M2.7’s Self‑Improving Magic

Baobao Algorithm Notes

Mar 3, 2026 · Artificial Intelligence

Boosting LLM Post-Training with RL: Tips for Efficiency and Stability

This article shares practical insights and pitfalls from six months of applying reinforcement learning to fine‑tune large language models, covering exploration efficiency, training stability, model selection, and special considerations for thinking‑oriented agents.

AIEfficiencyLLM

0 likes · 12 min read

Boosting LLM Post-Training with RL: Tips for Efficiency and Stability

Baobao Algorithm Notes

Mar 2, 2026 · Artificial Intelligence

How “Skills” Turn LLM Prompts into Portable, Engineered Workflows

This article dissects the evolution of LLM prompts into structured, version‑controlled skill packages, explains the AgentSkills specification, details OpenClaw’s implementation, compares prompts, memory, MCP and skills, and provides end‑to‑end examples with code, flowcharts and best‑practice recommendations.

Agent SkillsLLMOpenClaw

0 likes · 40 min read

How “Skills” Turn LLM Prompts into Portable, Engineered Workflows

Baobao Algorithm Notes

Mar 2, 2026 · Artificial Intelligence

Why Agentic AI Is Winning Over Workflows: The 2025 Evolution of LLM Agents

The article reviews the rapid shift in 2025 from complex workflow‑based LLM orchestration to streamlined agentic systems that rely on simple prompt loops, sandboxed tool execution, file‑based memory, and modular skill files, culminating in the rise of Agent Harness runtimes.

AI trendsLLM agentsSandbox execution

0 likes · 8 min read

Why Agentic AI Is Winning Over Workflows: The 2025 Evolution of LLM Agents

Baobao Algorithm Notes

Feb 25, 2026 · Artificial Intelligence

Exploring Qwen 3.5: Small‑Scale MoE Models, Architecture, and Deployment Guides

This article reviews the three open‑source Qwen 3.5 models—including a 35B MoE, a 122B MoE, and a 27B dense version—detailing their parameter layouts, core attention designs, context length, inference performance, hardware requirements, and provides step‑by‑step code examples for loading them with Hugging Face Transformers and vLLM.

AILarge Language ModelMoE

0 likes · 10 min read

Exploring Qwen 3.5: Small‑Scale MoE Models, Architecture, and Deployment Guides