Practical Experience Building Zhihu Direct Answer: An AI‑Powered Search Product
This article presents a comprehensive overview of Zhihu Direct Answer, describing its AI‑driven search architecture, RAG framework, query understanding, retrieval, chunking, reranking, generation, evaluation mechanisms, engineering optimizations, and the professional edition, while sharing concrete performance‑boosting practices and future development plans.
The talk introduces Zhihu Direct Answer, an AI‑enhanced search product that combines community‑curated knowledge with large language models, emphasizing its advantages of professionalism, creator interaction, credibility, and multi‑source data integration.
It explains the Retrieval‑Augmented Generation (RAG) framework, where user queries first retrieve relevant passages from a knowledge base and then feed them to a large model to generate accurate, traceable answers, reducing hallucinations.
For query handling, the team improved multi‑turn context understanding by fine‑tuning a model for query rewriting, semantic completion, and expansion, integrating the process within the search engine to control relevance and diversity while lowering cost.
The retrieval layer employs a multi‑strategy approach: semantic recall using a BGE‑tuned embedding model, tag‑based recall with a two‑stage LLM‑enhanced pipeline, and vector‑space alignment to handle long‑short document mixtures, supported by model fusion techniques such as Matryoshka, dense‑sparse hybrid, and 1‑bit quantization.
Chunking is tackled with both fixed‑window and generative methods; the final solution is a merge‑based chunker that ranks candidate chunks, merges them, and expands boundaries to balance latency, cost, and semantic completeness.
Reranking focuses on key‑information perception, diversity control, and authority enhancement by weighting community votes, ensuring the final answer prioritizes high‑quality content.
Generation incorporates metadata enrichment, planning capabilities for multi‑step reasoning, and continuous model alignment using DPO, PPO, and other reinforcement‑learning techniques, aiming for reliable and coherent outputs.
Evaluation combines automated scoring (LLM, preference models, bad‑case suites) with multi‑dimensional human reviews and A/B testing to guarantee product quality and reliability.
Engineering optimizations include a DAG‑based task orchestration, full‑stack monitoring, model quantization for ~50% cost reduction, and vertical‑domain model specialization while preserving >95% performance.
The professional edition adds high‑quality data sources (academic papers, journals), personal knowledge‑base management (PDF upload, intelligent parsing, directed Q&A), and deep‑reading capabilities.
Future directions aim at tighter integration with the Zhihu community, multimodal interaction, advanced reasoning (o1‑style), and continued specialization for research users.
Overall, the presentation shares practical insights and lessons learned from building a large‑scale AI search system, encouraging users to explore Zhihu Direct Answer’s possibilities.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.