Baobao Algorithm Notes
Author

Baobao Algorithm Notes

Author of the BaiMian large model, offering technology and industry insights.

294
Articles
0
Likes
147
Views
0
Comments
Recent Articles

Latest from Baobao Algorithm Notes

100 recent articles max
Baobao Algorithm Notes
Baobao Algorithm Notes
Aug 14, 2025 · Artificial Intelligence

Why Standard SFT Fails to Generalize and How One‑Line Dynamic Fine‑Tuning Fixes It

The article analyzes the poor generalization of supervised fine‑tuning (SFT) for large language models, reveals its gradient as a high‑variance inverse‑probability policy gradient, proposes a one‑line Dynamic Fine‑Tuning correction, and shows substantial gains on challenging math and offline RL benchmarks.

Dynamic Fine-TuningGeneralizationLLM alignment
0 likes · 7 min read
Why Standard SFT Fails to Generalize and How One‑Line Dynamic Fine‑Tuning Fixes It
Baobao Algorithm Notes
Baobao Algorithm Notes
Aug 4, 2025 · Artificial Intelligence

Why GPT‑OSS Chooses a 64‑Dimensional Attention Head and 2880 Hidden Size

This article analyzes the surprising design choices of the rumored GPT‑OSS 120B model, explaining the rationale behind a 64‑dimensional attention head, the equal hidden and intermediate sizes, and other quirky parameters such as MLP bias and KV‑sink SWA, backed by theoretical formulas and empirical benchmarks.

Attention HeadGPT-OSSMLP Ratio
0 likes · 13 min read
Why GPT‑OSS Chooses a 64‑Dimensional Attention Head and 2880 Hidden Size
Baobao Algorithm Notes
Baobao Algorithm Notes
Aug 1, 2025 · Artificial Intelligence

Unlocking Qwen3-Coder-30B: Features, Fast Start, and Agentic Coding Guide

The article introduces Qwen3‑Coder‑30B‑A3B‑Instruct (aka Qwen3‑Coder‑Flash), detailing its architecture, 256K‑to‑1M token context, agentic coding capabilities, installation steps with Transformers, sample code for tool use, optimal sampling parameters, and deployment tips across various runtimes.

AI coding assistantAgentic CodingLarge Language Model
0 likes · 6 min read
Unlocking Qwen3-Coder-30B: Features, Fast Start, and Agentic Coding Guide
Baobao Algorithm Notes
Baobao Algorithm Notes
Jul 29, 2025 · Artificial Intelligence

Qwen3‑30B‑A3B‑Instruct‑2507: New Instruction Model with Boosted General and Multilingual Skills

The Qwen3‑30B‑A3B‑Instruct‑2507 model, an updated non‑thinking version of Qwen3‑30B‑A3B, delivers significant gains in instruction following, reasoning, multilingual knowledge coverage, and 256K context length, and its performance is benchmarked against leading LLMs across a wide range of tasks.

Instruction TuningMixture‑of‑ExpertsQwen3
0 likes · 6 min read
Qwen3‑30B‑A3B‑Instruct‑2507: New Instruction Model with Boosted General and Multilingual Skills
Baobao Algorithm Notes
Baobao Algorithm Notes
Jul 28, 2025 · Industry Insights

Why AWS Bedrock AgentCore Signals a New Era for Agentic AI Infrastructure

The article analyzes AWS Bedrock AgentCore and related hardware and software requirements for Agentic AI, covering runtime isolation with microVMs, memory architectures, identity and gateway design, zero‑trust networking, and the challenges of multi‑tenant KVCache and context engineering.

AWS BedrockInfrastructureMemory Management
0 likes · 15 min read
Why AWS Bedrock AgentCore Signals a New Era for Agentic AI Infrastructure
Baobao Algorithm Notes
Baobao Algorithm Notes
Jul 18, 2025 · Artificial Intelligence

30+ Expert Q&A on Large Language Model Architecture, Training, and Deployment

This article compiles more than thirty interview‑style questions and detailed answers covering large‑model fundamentals such as encoder‑decoder trade‑offs, self‑attention versus RNN, context length, tokenization, embedding strategies, FlashAttention, RoPE, prompt design, retrieval‑augmented generation, safety measures, fine‑tuning, and model distillation, providing a comprehensive technical reference for practitioners.

attention mechanismretrieval-augmented generation
0 likes · 53 min read
30+ Expert Q&A on Large Language Model Architecture, Training, and Deployment
Baobao Algorithm Notes
Baobao Algorithm Notes
Jul 17, 2025 · Artificial Intelligence

How QK-Clip Tames MaxLogit Explosions in Trillion‑Parameter LLMs

The article introduces QK-Clip, a lightweight per‑head weight‑clipping technique that uses the MaxLogit signal to prevent uncontrolled logit growth in massive LLMs, explains its design, compares it with prior methods, and shows that it stabilizes training without harming model performance.

Attention stabilityLLM trainingMaxLogit
0 likes · 15 min read
How QK-Clip Tames MaxLogit Explosions in Trillion‑Parameter LLMs
Baobao Algorithm Notes
Baobao Algorithm Notes
Jul 16, 2025 · Artificial Intelligence

What Small Labs Reveal About RL Training: Multi‑Stage, Entropy, and Resource Strategies

The article analyzes Skywork OR1's technical report, detailing how small‑scale teams use GRPO‑based reinforcement learning with multi‑stage training, advantage‑mask variants, high‑temperature sampling, adaptive entropy loss, and resource‑allocation tricks to improve large language model performance while avoiding premature entropy collapse.

AI researchentropy controlmulti-stage training
0 likes · 21 min read
What Small Labs Reveal About RL Training: Multi‑Stage, Entropy, and Resource Strategies