Baobao Algorithm Notes
Author

Baobao Algorithm Notes

Author of the BaiMian large model, offering technology and industry insights.

294
Articles
0
Likes
147
Views
0
Comments
Recent Articles

Latest from Baobao Algorithm Notes

100 recent articles max
Baobao Algorithm Notes
Baobao Algorithm Notes
Feb 24, 2026 · Artificial Intelligence

The Bitter Lesson of Building Agentic RL in Terminal Environments

This article recounts the challenges of moving from single‑step RL with verifiable rewards to multi‑step agentic reinforcement learning in terminal environments, detailing infrastructure design, asynchronous pipelines, data quality checks, masking strategies, curriculum training, chunk‑based optimization, and practical lessons learned from large‑scale experiments.

Asynchronous TrainingCredit AssignmentEnvironment Augmentation
0 likes · 33 min read
The Bitter Lesson of Building Agentic RL in Terminal Environments
Baobao Algorithm Notes
Baobao Algorithm Notes
Feb 12, 2026 · Artificial Intelligence

GLM-5 Unleashed: How the New Chinese LLM Tackles Full‑Stack Architecture and Complex System Design

The article reviews the newly released GLM-5 model, highlighting its ability to generate end‑to‑end system designs, write and debug backend code, and solve large‑scale engineering problems through detailed prompts, positioning it alongside GPT‑5.3 and Claude Opus in the competitive LLM landscape.

AI programmingBackend ArchitectureGLM-5
0 likes · 9 min read
GLM-5 Unleashed: How the New Chinese LLM Tackles Full‑Stack Architecture and Complex System Design
Baobao Algorithm Notes
Baobao Algorithm Notes
Feb 6, 2026 · Artificial Intelligence

Skywork Desktop AI Agent on Windows: Hands‑On Review and Real‑World Use Cases

The article evaluates Skywork's newly released Windows desktop AI agent by testing it on news aggregation, file organization, PC purchase research with PDF generation, and comparing its features—such as native Windows support, multi‑model auto mode, 100+ built‑in skills, and local VM isolation—to Claude Cowork, while also noting pricing and download details.

Claude Cowork comparisonProduct ReviewSkywork
0 likes · 7 min read
Skywork Desktop AI Agent on Windows: Hands‑On Review and Real‑World Use Cases
Baobao Algorithm Notes
Baobao Algorithm Notes
Feb 4, 2026 · Artificial Intelligence

Efficient Long-Sequence Modeling: Linear & Sparse Attention, MegaKernels, RL Tricks

This article reviews recent 2025 advances in long‑sequence LLM inference, covering Kimi Linear attention, DuoAttention and DeepSeek Sparse Attention, MegaKernel and MPK designs for kernel‑level efficiency, reinforcement‑learning rollout optimizations, and the Tawa deep‑learning compiler framework.

Deep Learning CompilerLLM optimizationLinear Attention
0 likes · 22 min read
Efficient Long-Sequence Modeling: Linear & Sparse Attention, MegaKernels, RL Tricks
Baobao Algorithm Notes
Baobao Algorithm Notes
Feb 4, 2026 · Artificial Intelligence

Mastering Reinforcement Learning: From Basics to Advanced Agentic RL Techniques

This comprehensive guide walks through reinforcement learning fundamentals, MDP modeling, value functions, Bellman equations, and key algorithms such as Q‑learning, REINFORCE, PPO, DPO, and GRPO, then contrasts LLM‑RL with Agentic‑RL and surveys leading industry frameworks and real‑world applications.

Artificial IntelligenceLLMMachine Learning
0 likes · 42 min read
Mastering Reinforcement Learning: From Basics to Advanced Agentic RL Techniques
Baobao Algorithm Notes
Baobao Algorithm Notes
Feb 3, 2026 · Industry Insights

How One Developer Merges 600 Commits a Day with AI: Inside the Clawdbot Workflow

In this in‑depth interview, Peter Steinberger explains how AI agents let him submit and merge hundreds of commits daily, replace traditional code reviews with prompt‑driven requests, and redesign his development workflow around a closed‑loop validation system that reshapes modern software engineering.

AI EngineeringAgent workflowCode Review
0 likes · 85 min read
How One Developer Merges 600 Commits a Day with AI: Inside the Clawdbot Workflow
Baobao Algorithm Notes
Baobao Algorithm Notes
Jan 28, 2026 · Artificial Intelligence

How to Deploy a 24/7 AI Assistant with Moltbot and MiniMax M2.1

This guide walks you through installing the open‑source Moltbot (formerly Clawdbot), configuring it with the MiniMax M2.1 large‑language model, and connecting it to messaging platforms such as Telegram, WhatsApp, and Feishu, enabling a continuously running AI assistant that can handle emails, code edits, and complex tasks.

AI AssistantAutomationFeishu integration
0 likes · 8 min read
How to Deploy a 24/7 AI Assistant with Moltbot and MiniMax M2.1
Baobao Algorithm Notes
Baobao Algorithm Notes
Jan 27, 2026 · Artificial Intelligence

Putting Kimi K2.5 and Kimi Code to the Test: Real‑World AI Agent Benchmarks

This article presents a hands‑on evaluation of Kimi K2.5 and its open‑source Kimi Code agent across a series of hard‑core prompts, covering Python API generation, cost‑optimized routing, multimodal ECharts visualisation, massive‑scale SQL optimisation, web‑search‑driven research, MoE explanation and video‑to‑code workflows.

AI AgentKimiLarge Language Model
0 likes · 9 min read
Putting Kimi K2.5 and Kimi Code to the Test: Real‑World AI Agent Benchmarks
Baobao Algorithm Notes
Baobao Algorithm Notes
Jan 26, 2026 · Artificial Intelligence

From Search Ads to Foundation Models: My Journey Building the EvoCUA GUI Agent

The author explains why he transitioned from search advertising algorithms to foundation model research, outlines the four typical activities of base‑model teams, and shares detailed technical insights, experimental practices, and scaling strategies that led the EvoCUA GUI Agent to achieve open‑source SOTA on OSWorld.

AI researchGUI agentsModel Scaling
0 likes · 17 min read
From Search Ads to Foundation Models: My Journey Building the EvoCUA GUI Agent
Baobao Algorithm Notes
Baobao Algorithm Notes
Jan 24, 2026 · Artificial Intelligence

What Advances Do GRPO, DAPO, GSPO, and SAPO Bring Over PPO?

After DPO, the typical research trajectory moves through GRPO, DAPO, GSPO, and SAPO, each introducing new optimization objectives, sampling strategies, and reward‑shaping techniques that aim to reduce memory usage, improve gradient stability, and enhance the efficiency of large‑model reinforcement learning.

DAPOGRPOGSPO
0 likes · 6 min read
What Advances Do GRPO, DAPO, GSPO, and SAPO Bring Over PPO?