Wu Shixiong's Large Model Academy
Author

Wu Shixiong's Large Model Academy

We continuously share large‑model know‑how, helping you master core skills—LLM, RAG, fine‑tuning, deployment—from zero to job offer, tailored for career‑switchers, autumn recruiters, and those seeking stable large‑model positions.

111
Articles
0
Likes
110
Views
0
Comments
Recent Articles

Latest from Wu Shixiong's Large Model Academy

100 recent articles max
Wu Shixiong's Large Model Academy
Wu Shixiong's Large Model Academy
Mar 22, 2026 · Artificial Intelligence

How to Overcome MinerU’s Top 9 Limitations for Reliable Document Parsing

This article examines MinerU’s strengths and nine critical shortcomings—such as reading order errors, split tables, merged cells, OCR misrecognition, formula handling, heading hierarchy loss, output inconsistency, hardware limits, and licensing issues—and provides concrete improvement strategies and interview‑ready talking points for engineers.

Document ParsingInterview TipsMinerU
0 likes · 12 min read
How to Overcome MinerU’s Top 9 Limitations for Reliable Document Parsing
Wu Shixiong's Large Model Academy
Wu Shixiong's Large Model Academy
Mar 21, 2026 · Artificial Intelligence

Step‑by‑Step Guide to Implementing a Hybrid Retrieval Function with RRF Fusion

This article breaks down the end‑to‑end retrieval function used in a RAG system, detailing each of the five stages—from request construction, hybrid vector + BM25 search, RRF fusion, cross‑encoder reranking, to threshold filtering—and provides concrete Python code, parameter choices, and performance insights.

Cross-EncoderElasticsearchHybrid Retrieval
0 likes · 13 min read
Step‑by‑Step Guide to Implementing a Hybrid Retrieval Function with RRF Fusion
Wu Shixiong's Large Model Academy
Wu Shixiong's Large Model Academy
Mar 20, 2026 · Artificial Intelligence

Mastering MinerU: Overcoming Its Top 9 Limitations for Reliable Document Parsing

This article examines MinerU's strengths and nine critical shortcomings—such as layout order errors, cross‑page table splits, merged‑cell failures, OCR misrecognition, and licensing issues—and provides concrete improvement strategies, interview‑ready resume bullets, and practical response frameworks for engineers.

LLMLayout AnalysisMinerU
0 likes · 13 min read
Mastering MinerU: Overcoming Its Top 9 Limitations for Reliable Document Parsing
Wu Shixiong's Large Model Academy
Wu Shixiong's Large Model Academy
Mar 19, 2026 · Artificial Intelligence

Making LLM Answers Trustworthy: Citation Attribution and Hallucination Detection

This article explains why simple prompt‑based citation is insufficient for Retrieval‑Augmented Generation, introduces a sentence‑level attribution pipeline, combines semantic similarity with NLI verification, and presents practical hallucination detection and structured JSON output to ensure answer reliability.

LLM ReliabilityNLIPrompt Engineering
0 likes · 10 min read
Making LLM Answers Trustworthy: Citation Attribution and Hallucination Detection
Wu Shixiong's Large Model Academy
Wu Shixiong's Large Model Academy
Mar 17, 2026 · Artificial Intelligence

Mastering Chunk Splitting for RAG: From Fixed Length to Semantic Segmentation

Chunk splitting, a critical yet often overlooked step in RAG pipelines, dramatically impacts retrieval recall and LLM output quality; this guide walks through three evolution stages—from naive fixed‑length splits to sentence‑aware overlaps and finally semantic, structure‑driven segmentation—complete with code, experiments, and practical pitfalls.

ChunkingLLMRAG
0 likes · 15 min read
Mastering Chunk Splitting for RAG: From Fixed Length to Semantic Segmentation