Intelligent Question Answering in WeChat Search: Knowledge‑Graph QA and Document‑Based QA Techniques
This article introduces the intelligent question‑answering technology used in WeChat Search, covering background, knowledge‑graph‑based QA, document‑based QA, technical pipelines, key modules such as entity linking and relation recognition, and future research directions.
The talk begins with a background on why traditional search is insufficient for answering user queries and motivates the need for intelligent question answering (QA) in the WeChat Search scenario.
Four types of user QA needs are described: factual (entity) queries, opinion (yes/no) queries, summary (long‑answer) queries, and list/step queries.
The technical roadmap is split into two main tracks: Knowledge‑Graph QA (KBQA) and Document‑Based QA (DocQA). KBQA relies on a structured knowledge base and semantic parsing, while DocQA uses retrieval plus machine reading comprehension over unstructured text.
For KBQA, the pipeline includes entity linking, relation recognition, topic entity detection, constraint identification, and query‑graph reasoning. Entity linking uses NER, candidate recall (both dictionary‑based and vector‑based), and a BERT‑based disambiguation model that also incorporates attribute information. Relation recognition builds a template library from seed triples and expands it via synonym patterns, then encodes relations with one‑hot and token embeddings for matching.
Complex queries are handled by generating candidate query graphs, pruning them with binary classification, one‑hop pruning, and heuristic rules, and ranking candidates with either BERT semantic matching or graph‑based encoders.
DocQA follows a retrieval‑plus‑MRC workflow: filter QA intent, retrieve relevant paragraphs, apply a multi‑paragraph reader, re‑rank answers, and output the final result. Challenges such as limited labeled data, training‑inference mismatch, and answer extraction for long or yes/no answers are addressed with large‑scale pre‑training, distant supervision, soft label smoothing, and cross‑paragraph normalization.
The future outlook highlights three research directions: more robust parsing of complex multi‑intent queries, improving MRC stability through adversarial training, and extracting conditioned short‑answer entities across multiple passages.
A short Q&A section clarifies practical details about KBQA seed‑triple labeling, entity role identification, and negative‑sample selection.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.