Author

Baobao Algorithm Notes

Author of the BaiMian large model, offering technology and industry insights.

294

Articles

Likes

144

Views

Comments

Latest from Baobao Algorithm Notes

100 recent articles max

Baobao Algorithm Notes

May 26, 2025 · Artificial Intelligence

When Should Large Language Models Think? 10 Cutting‑Edge Strategies to Boost Reasoning Efficiency

This article reviews ten recent papers that tackle the over‑thinking problem in large language models by shortening chain‑of‑thought reasoning, introducing dynamic early‑exit, adaptive thinking triggers, and reinforcement‑learning‑based training, showing how models can maintain or improve accuracy while dramatically reducing token usage and latency.

AI researchModel Pruningadaptive inference

0 likes · 38 min read

When Should Large Language Models Think? 10 Cutting‑Edge Strategies to Boost Reasoning Efficiency

Baobao Algorithm Notes

May 20, 2025 · Artificial Intelligence

Boosting RLHF Training Efficiency with Asynchronous vLLM and Ray Integration

This article explains how an asynchronous RLHF pipeline built on vLLM, Ray, and OpenRLHF dramatically reduces training bottlenecks by decoupling inference, environment interaction, and model updates, and provides detailed implementation code and design choices for scalable reinforcement learning.

OpenRLHFRLHFRay

0 likes · 11 min read

Boosting RLHF Training Efficiency with Asynchronous vLLM and Ray Integration

Baobao Algorithm Notes

May 16, 2025 · Artificial Intelligence

Why Multi‑Turn LLM Evaluation Fails and How a User‑Simulator Can Fix It

The article explains that large language models lose up to 35% performance in multi‑turn conversations, critiques static single‑turn evaluation methods, and proposes a dynamic user‑simulator with loss‑masking techniques to generate realistic test turns and improve assessment reliability.

AI testingLLMRLHF

0 likes · 6 min read

Why Multi‑Turn LLM Evaluation Fails and How a User‑Simulator Can Fix It

Baobao Algorithm Notes

May 13, 2025 · Artificial Intelligence

How Qwen3 Achieves Multi-Stage Pretraining, Long-Context, and Thought-Controlled RL

The article details Qwen3's three‑phase pretraining pipeline, long‑context extensions, a cold‑start long‑chain‑of‑thought dataset, reinforcement‑learning fine‑tuning with custom rewards, and a two‑stage distillation process that yields versatile, thought‑controlled language models.

Qwen3distillationlong-context

0 likes · 15 min read

How Qwen3 Achieves Multi-Stage Pretraining, Long-Context, and Thought-Controlled RL

Baobao Algorithm Notes

May 13, 2025 · Artificial Intelligence

Why Decoder‑Only Models Dominate AI Today: Beyond the Low‑Rank Myth

The article explains why the once‑popular low‑rank argument is outdated and how decoder‑only architectures have become mainstream thanks to KV‑cache efficiency, open‑source projects like vLLM and sglang, and their impact on modern AI interview expectations.

KV Cachedecoder-onlyopen-source

0 likes · 3 min read

Why Decoder‑Only Models Dominate AI Today: Beyond the Low‑Rank Myth

Baobao Algorithm Notes

May 12, 2025 · Artificial Intelligence

Why Dropout Is Dropped in Large‑Scale Model Training: Effects, Efficiency, Stability

Training massive AI models now commonly omits dropout because its original scaling trick fails to match training and inference distributions, leading to poorer performance, higher computational cost, and instability, while alternative regularization like normalization remains useful, as illustrated by practical observations and historical tricks.

AI stabilityDropoutLarge Models

0 likes · 6 min read

Why Dropout Is Dropped in Large‑Scale Model Training: Effects, Efficiency, Stability

Baobao Algorithm Notes

May 2, 2025 · Artificial Intelligence

Do Reinforcement Learning Techniques Really Boost LLM Reasoning? A Deep Dive into Recent Models

This article analyzes whether reinforcement learning enhances large language model reasoning, compares findings from DeepSeek-Math, a Tsinghua‑Shanghai Jiao‑Tong paper, and Qwen3, and outlines practical training pipelines—including Seed‑Thinking‑v1.5, DeepSeek‑R1, Kimi‑K1.5, and Qwen3—that aim to endow LLMs with robust reasoning capabilities.

Artificial IntelligenceLLMReasoning

0 likes · 12 min read

Do Reinforcement Learning Techniques Really Boost LLM Reasoning? A Deep Dive into Recent Models

Baobao Algorithm Notes

Apr 28, 2025 · Artificial Intelligence

What Makes Qwen3 the Next Leap in Large Language Models?

The article announces Qwen3, detailing its flagship 235B and smaller MoE models, superior benchmark performance, extensive multilingual support, expanded pretraining data, four-stage post‑training, flexible thinking modes, deployment guides for SGLang, vLLM, Ollama, and future plans toward AGI‑level capabilities.

AI researchDeploymentQwen3

0 likes · 15 min read

What Makes Qwen3 the Next Leap in Large Language Models?

Baobao Algorithm Notes

Apr 27, 2025 · Artificial Intelligence

How DeepSeek R1T‑Chimera Cuts Tokens by 40% Without Fine‑Tuning

The DeepSeek‑R1T‑Chimera model merges DeepSeek‑R1 reasoning with V3‑0324 architecture, reusing most V3 weights and swapping only the blue‑highlighted R1 routing experts, achieving the same intelligence as R1 while reducing output tokens by about 40% and running faster, all without any fine‑tuning or distillation.

Artificial IntelligenceDeepSeekLLM

0 likes · 5 min read

How DeepSeek R1T‑Chimera Cuts Tokens by 40% Without Fine‑Tuning

Baobao Algorithm Notes

Apr 27, 2025 · Artificial Intelligence

How Model Fusion Cut LLM Chain‑of‑Thought Length by 40% Without Fine‑Tuning

A small tech firm, tngtech, released an open‑source model fusion called DeepSeek‑R1T‑Chimera that merges R1 inference with V3‑0324 without fine‑tuning, distillation, or prompts, achieving the same intelligence as R1 while reducing token output by 40% and speeding up inference.

Artificial IntelligenceDeepSeekLLM

0 likes · 4 min read

How Model Fusion Cut LLM Chain‑of‑Thought Length by 40% Without Fine‑Tuning