Wu Shixiong's Large Model Academy
Author

Wu Shixiong's Large Model Academy

We continuously share large‑model know‑how, helping you master core skills—LLM, RAG, fine‑tuning, deployment—from zero to job offer, tailored for career‑switchers, autumn recruiters, and those seeking stable large‑model positions.

109
Articles
0
Likes
108
Views
0
Comments
Recent Articles

Latest from Wu Shixiong's Large Model Academy

100 recent articles max
Wu Shixiong's Large Model Academy
Wu Shixiong's Large Model Academy
Apr 30, 2026 · Artificial Intelligence

When Is Claude Code’s Memory Injected into system_prompt? Interview Insight

The article explains that Claude Code loads persisted memory once at REPL startup via _build_system(), inserts it as the 10th segment of system_prompt, enforces a 200‑line limit on MEMORY.md, deliberately avoids side‑effects in get_memory_dir(), and only refreshes the prompt with the /model command.

Claude CodeInterview preparationLLM
0 likes · 11 min read
When Is Claude Code’s Memory Injected into system_prompt? Interview Insight
Wu Shixiong's Large Model Academy
Wu Shixiong's Large Model Academy
Apr 29, 2026 · Interview Experience

ByteDance Interviewer Asks: What Rank r Do You Use for LoRA? I Said 64—He Said I'm Wasting GPU Memory

The article examines a common interview scenario where candidates are asked about LoRA rank selection, outlines two typical mistakes—guessing or staying silent—and presents a three‑step strategy of honest boundary setting, logical derivation, and asking a focused question, illustrating the approach with concrete LoRA calculations and a vLLM case study.

AI EngineeringLoRAinterview strategy
0 likes · 13 min read
ByteDance Interviewer Asks: What Rank r Do You Use for LoRA? I Said 64—He Said I'm Wasting GPU Memory
Wu Shixiong's Large Model Academy
Wu Shixiong's Large Model Academy
Apr 28, 2026 · Artificial Intelligence

Why Bigger Context Fails for Deep Research Agents and How IterResearch Fixes It

Interviewers point out that simply enlarging the LLM’s context window cannot prevent forgetting early conclusions in long‑step Deep Research tasks; the article explains the ReAct context issues, introduces the IterResearch framework with evolving reports, and compares its accuracy, cost, and scalability against ReAct and ReSum.

Context ManagementIterResearchLLM
0 likes · 17 min read
Why Bigger Context Fails for Deep Research Agents and How IterResearch Fixes It
Wu Shixiong's Large Model Academy
Wu Shixiong's Large Model Academy
Apr 27, 2026 · Artificial Intelligence

Can Your RAG Pass the Demo? Scaling to 5,000 Docs for Reliable Answers

The article walks through the practical challenges of turning a RAG demo into a production system for 5,000 insurance documents, covering knowledge‑base chunking, embedding model selection, recall‑threshold tuning, hybrid vector‑BM25 retrieval, intent‑aware query routing, prompt constraints, confidence scoring, and operational scaling, with concrete metrics and code examples.

EmbeddingHybrid RetrievalRAG
0 likes · 16 min read
Can Your RAG Pass the Demo? Scaling to 5,000 Docs for Reliable Answers
Wu Shixiong's Large Model Academy
Wu Shixiong's Large Model Academy
Apr 23, 2026 · Industry Insights

Should You Take a Tencent AI Internship? Key Factors to Consider

The article examines whether a Tencent AI internship is worth pursuing by analyzing the program’s growth stage, unique user ecosystem, mentorship structure, compensation model, and early‑year advantages, illustrated with real intern case studies, to help students decide what they aim to gain from the experience.

AI internshipAI researchTech industry
0 likes · 14 min read
Should You Take a Tencent AI Internship? Key Factors to Consider
Wu Shixiong's Large Model Academy
Wu Shixiong's Large Model Academy
Apr 22, 2026 · Artificial Intelligence

How to Classify and Manage Agent Memories for Better Retrieval

This article dissects Claude Code's memory system, explains why unstructured memory degrades performance, introduces four distinct memory types with concrete examples and schema, shows how to handle expiration and retrieval strategies, and provides step‑by‑step implementation code to improve agent reliability.

Agent MemoryLLMPython
0 likes · 19 min read
How to Classify and Manage Agent Memories for Better Retrieval
Wu Shixiong's Large Model Academy
Wu Shixiong's Large Model Academy
Apr 21, 2026 · Artificial Intelligence

When Should an LLM Agent Extract Memory? A Deep Dive into Trigger Strategies

The article analyzes why memory extraction in LLM‑driven agents incurs cost, compares four frameworks—Claude Code, Generative Agents, MemGPT, and Mem0—detailing their trigger mechanisms, concurrency handling, and trade‑offs, and offers practical guidance for choosing the right strategy in real‑time, social, or batch‑processing scenarios.

AI EngineeringAgent DesignLLM
0 likes · 18 min read
When Should an LLM Agent Extract Memory? A Deep Dive into Trigger Strategies
Wu Shixiong's Large Model Academy
Wu Shixiong's Large Model Academy
Apr 20, 2026 · Artificial Intelligence

How to Build Multi‑Step Reasoning Training Data for Deep Research Agents

Standard QA datasets fall short for deep research tasks because they lack the multi‑step, dynamic reasoning required; this article explains why, outlines four data‑construction techniques—SailorFog‑QA, WebFrontier, WebShaper, E2HQA—details trajectory sampling, filtering, scale considerations, and interview‑ready explanations.

AI agentsLLM trainingMulti-step Reasoning
0 likes · 16 min read
How to Build Multi‑Step Reasoning Training Data for Deep Research Agents
Wu Shixiong's Large Model Academy
Wu Shixiong's Large Model Academy
Apr 17, 2026 · Backend Development

How Claude Code’s Memory System Works: From SHA‑256 Storage to Coalescing Extraction

This article dissects Claude Code’s Memory subsystem, explaining the distinction between Session logs and persistent Memory, the SHA‑256‑based storage layout, file indexing, four memory types, prompt injection steps, two write pathways, the ExtractionCoordinator’s coalescing strategy, and how to explain the design in interviews.

Backend ArchitectureClaude Codeconcurrency
0 likes · 19 min read
How Claude Code’s Memory System Works: From SHA‑256 Storage to Coalescing Extraction