Author

Baobao Algorithm Notes

Author of the BaiMian large model, offering technology and industry insights.

294

Articles

Likes

144

Views

Comments

Latest from Baobao Algorithm Notes

100 recent articles max

Baobao Algorithm Notes

Oct 20, 2025 · Artificial Intelligence

Can Visual Tokens Compress Text? Inside DeepSeek-OCR’s Optical Compression

DeepSeek‑OCR introduces a novel visual encoder that transforms text into images, achieving up to 10‑20× token compression while maintaining OCR accuracy, and demonstrates strong performance on OmniDocBench with a 3B‑parameter model across multilingual and multimodal tasks.

AIDeepSeekOCR

0 likes · 10 min read

Can Visual Tokens Compress Text? Inside DeepSeek-OCR’s Optical Compression

Baobao Algorithm Notes

Sep 28, 2025 · Artificial Intelligence

How Much GPU Memory Do LLMs Really Need? A Deep Dive into Training & Inference

This article breaks down the GPU memory requirements of large language models during training and inference, detailing the contributions of model weights, optimizer states, activations, KV cache, and activation recomputation, and provides concrete formulas, examples, and scaling insights for models like Qwen3 and DeepSeek V3.

GPU MemoryKV CacheLLM

0 likes · 18 min read

How Much GPU Memory Do LLMs Really Need? A Deep Dive into Training & Inference

Baobao Algorithm Notes

Sep 23, 2025 · Artificial Intelligence

How LongCat-Flash-Thinking Sets New SOTA in Open‑Source AI Inference

LongCat-Flash-Thinking, the latest open‑source model from Meituan, introduces domain‑parallel RL training, a high‑throughput DORA infra, and a dual‑path inference framework that together achieve state‑of‑the‑art performance on logical, mathematical, coding, and agentic tasks while maintaining top‑tier speed.

LongCatRL trainingbenchmark

0 likes · 10 min read

How LongCat-Flash-Thinking Sets New SOTA in Open‑Source AI Inference

Baobao Algorithm Notes

Sep 22, 2025 · Artificial Intelligence

How to Add Special Tokens to LLMs Without Losing Performance

This guide explains why naïvely adding special tokens during supervised fine‑tuning can destabilize a large language model, and provides step‑by‑step strategies—including tokenizer updates, embedding resizing, smart initialization, and LoRA‑based PEFT—to integrate new tokens while preserving the model's original capabilities.

LLMLoRAspecial tokens

0 likes · 9 min read

How to Add Special Tokens to LLMs Without Losing Performance

Baobao Algorithm Notes

Sep 10, 2025 · Artificial Intelligence

Qwen3-Next Unveiled: Sparse MoE, Hybrid Attention & Multi‑Token Prediction

A recent Hugging Face pull request reveals Alibaba’s upcoming Qwen3‑Next series, highlighting its extreme‑context, parameter‑efficient design that combines a 1:50 high‑sparsity MoE, a hybrid attention architecture mixing gated attention with Gated DeltaNet, and a Multi‑Token Prediction technique, promising ten‑fold throughput gains for 32K‑plus token contexts.

AI ArchitectureHybrid AttentionMulti‑token prediction

0 likes · 8 min read

Qwen3-Next Unveiled: Sparse MoE, Hybrid Attention & Multi‑Token Prediction

Baobao Algorithm Notes

Sep 9, 2025 · Artificial Intelligence

Why Do Language Models Hallucinate? Roots, Risks, and a New Evaluation Approach

The article analyzes OpenAI's study on language‑model hallucinations, explaining how statistical limits in pre‑training and flawed binary evaluation incentives cause false answers, and proposes a confidence‑threshold scoring system that rewards honest "I don’t know" responses to improve reliability.

AI safetyLanguage Modelsconfidence threshold

0 likes · 8 min read

Why Do Language Models Hallucinate? Roots, Risks, and a New Evaluation Approach

Baobao Algorithm Notes

Sep 3, 2025 · Artificial Intelligence

How Atom-Searcher Boosts LLM Reasoning with Atomic Thought Rewards

Atom-Searcher introduces an atomic‑thought reinforcement‑learning framework that decomposes complex reasoning into fine‑grained units, uses a Reasoning Reward Model to assign step‑wise rewards, dynamically balances process and result incentives, and achieves state‑of‑the‑art performance on multiple LLM benchmarks.

Agentic ResearchAtomic ThoughtLLM

0 likes · 12 min read

How Atom-Searcher Boosts LLM Reasoning with Atomic Thought Rewards

Baobao Algorithm Notes

Sep 2, 2025 · Artificial Intelligence

How LongCat‑Flash Achieves Record Speed and Efficiency for a 560B MoE Model

LongCat‑Flash is a 560‑billion‑parameter Mixture‑of‑Experts LLM that combines a dynamic zero‑computation expert design, shortcut‑connected MoE communication, variance‑aligned scaling, and a three‑stage agent‑centric pre‑training pipeline, delivering over 100 TPS on H800 GPUs at a cost of $0.70 per million tokens.

Artificial IntelligenceInference OptimizationLarge Language Model

0 likes · 23 min read

How LongCat‑Flash Achieves Record Speed and Efficiency for a 560B MoE Model

Baobao Algorithm Notes

Aug 17, 2025 · Artificial Intelligence

Boost 7B LLM Math Reasoning Beyond GPT‑4o with a Simple Pass@k Reward

By replacing the traditional Pass@1 reward with a Pass@k formulation and a lightweight advantage computation, a 7B language model can dramatically improve its performance on math reasoning benchmarks, surpassing GPT‑4o while adding only a few lines of code and minimal training overhead.

PythonRLHFreward engineering

0 likes · 7 min read

Boost 7B LLM Math Reasoning Beyond GPT‑4o with a Simple Pass@k Reward

Baobao Algorithm Notes

Aug 15, 2025 · Artificial Intelligence

Unlocking LLM Performance: Classic Deep RL Tricks Reimagined for Modern Training

This article systematically adapts classic deep reinforcement‑learning techniques—such as multi‑step returns, TD(λ)/GAE, V‑trace corrections, uncertainty‑aware weighting, safety constraints, distribution‑robust optimization, and value‑guided decoding—to improve large language model training and inference, providing concrete formulas, implementation tips, and empirical results.

Deep RLGAELLM

0 likes · 17 min read

Unlocking LLM Performance: Classic Deep RL Tricks Reimagined for Modern Training