Author

Wu Shixiong's Large Model Academy

We continuously share large‑model know‑how, helping you master core skills—LLM, RAG, fine‑tuning, deployment—from zero to job offer, tailored for career‑switchers, autumn recruiters, and those seeking stable large‑model positions.

109

Articles

Likes

108

Views

Comments

Latest from Wu Shixiong's Large Model Academy

100 recent articles max

Wu Shixiong's Large Model Academy

Apr 30, 2026 · Artificial Intelligence

When Is Claude Code’s Memory Injected into system_prompt? Interview Insight

The article explains that Claude Code loads persisted memory once at REPL startup via _build_system(), inserts it as the 10th segment of system_prompt, enforces a 200‑line limit on MEMORY.md, deliberately avoids side‑effects in get_memory_dir(), and only refreshes the prompt with the /model command.

Claude CodeInterview preparationLLM

0 likes · 11 min read

When Is Claude Code’s Memory Injected into system_prompt? Interview Insight

Wu Shixiong's Large Model Academy

Apr 29, 2026 · Interview Experience

ByteDance Interviewer Asks: What Rank r Do You Use for LoRA? I Said 64—He Said I'm Wasting GPU Memory

The article examines a common interview scenario where candidates are asked about LoRA rank selection, outlines two typical mistakes—guessing or staying silent—and presents a three‑step strategy of honest boundary setting, logical derivation, and asking a focused question, illustrating the approach with concrete LoRA calculations and a vLLM case study.

AI EngineeringLoRAinterview strategy

0 likes · 13 min read

ByteDance Interviewer Asks: What Rank r Do You Use for LoRA? I Said 64—He Said I'm Wasting GPU Memory

Wu Shixiong's Large Model Academy

Apr 28, 2026 · Artificial Intelligence

Why Bigger Context Fails for Deep Research Agents and How IterResearch Fixes It

Interviewers point out that simply enlarging the LLM’s context window cannot prevent forgetting early conclusions in long‑step Deep Research tasks; the article explains the ReAct context issues, introduces the IterResearch framework with evolving reports, and compares its accuracy, cost, and scalability against ReAct and ReSum.

Context ManagementIterResearchLLM

0 likes · 17 min read

Why Bigger Context Fails for Deep Research Agents and How IterResearch Fixes It

Wu Shixiong's Large Model Academy

Apr 27, 2026 · Artificial Intelligence

Can Your RAG Pass the Demo? Scaling to 5,000 Docs for Reliable Answers

The article walks through the practical challenges of turning a RAG demo into a production system for 5,000 insurance documents, covering knowledge‑base chunking, embedding model selection, recall‑threshold tuning, hybrid vector‑BM25 retrieval, intent‑aware query routing, prompt constraints, confidence scoring, and operational scaling, with concrete metrics and code examples.

EmbeddingHybrid RetrievalRAG

0 likes · 16 min read

Can Your RAG Pass the Demo? Scaling to 5,000 Docs for Reliable Answers

Wu Shixiong's Large Model Academy

Apr 23, 2026 · Industry Insights

Should You Take a Tencent AI Internship? Key Factors to Consider

The article examines whether a Tencent AI internship is worth pursuing by analyzing the program’s growth stage, unique user ecosystem, mentorship structure, compensation model, and early‑year advantages, illustrated with real intern case studies, to help students decide what they aim to gain from the experience.

AI internshipAI researchTech industry

0 likes · 14 min read

Should You Take a Tencent AI Internship? Key Factors to Consider

Wu Shixiong's Large Model Academy

Apr 22, 2026 · Artificial Intelligence

How to Classify and Manage Agent Memories for Better Retrieval

This article dissects Claude Code's memory system, explains why unstructured memory degrades performance, introduces four distinct memory types with concrete examples and schema, shows how to handle expiration and retrieval strategies, and provides step‑by‑step implementation code to improve agent reliability.

Agent MemoryLLMPython

0 likes · 19 min read

How to Classify and Manage Agent Memories for Better Retrieval

Wu Shixiong's Large Model Academy

Apr 21, 2026 · Artificial Intelligence

When Should an LLM Agent Extract Memory? A Deep Dive into Trigger Strategies

The article analyzes why memory extraction in LLM‑driven agents incurs cost, compares four frameworks—Claude Code, Generative Agents, MemGPT, and Mem0—detailing their trigger mechanisms, concurrency handling, and trade‑offs, and offers practical guidance for choosing the right strategy in real‑time, social, or batch‑processing scenarios.

AI EngineeringAgent DesignLLM

0 likes · 18 min read

When Should an LLM Agent Extract Memory? A Deep Dive into Trigger Strategies

Wu Shixiong's Large Model Academy

Apr 20, 2026 · Artificial Intelligence

Why Java Skills Alone Won’t Cut It for LLM Application Engineering

The article debunks the myth that Java developers only need a bit of AI knowledge to succeed in LLM application roles, explaining the full engineering stack—from retrieval and prompt design to deployment and performance tuning—through real‑world examples, metrics, and interview‑ready advice.

AI EngineeringInterview preparationLLM

0 likes · 13 min read

Why Java Skills Alone Won’t Cut It for LLM Application Engineering

Wu Shixiong's Large Model Academy

Apr 20, 2026 · Artificial Intelligence

How to Build Multi‑Step Reasoning Training Data for Deep Research Agents

Standard QA datasets fall short for deep research tasks because they lack the multi‑step, dynamic reasoning required; this article explains why, outlines four data‑construction techniques—SailorFog‑QA, WebFrontier, WebShaper, E2HQA—details trajectory sampling, filtering, scale considerations, and interview‑ready explanations.

AI agentsLLM trainingMulti-step Reasoning

0 likes · 16 min read

How to Build Multi‑Step Reasoning Training Data for Deep Research Agents

Wu Shixiong's Large Model Academy

Apr 17, 2026 · Backend Development

How Claude Code’s Memory System Works: From SHA‑256 Storage to Coalescing Extraction

This article dissects Claude Code’s Memory subsystem, explaining the distinction between Session logs and persistent Memory, the SHA‑256‑based storage layout, file indexing, four memory types, prompt injection steps, two write pathways, the ExtractionCoordinator’s coalescing strategy, and how to explain the design in interviews.

Backend ArchitectureClaude Codeconcurrency

0 likes · 19 min read

How Claude Code’s Memory System Works: From SHA‑256 Storage to Coalescing Extraction