Artificial Intelligence 12 min read

qa_match: An Open‑Source Deep Learning Based Question‑Answer Matching System

The article introduces qa_match, an open‑source lightweight QA matching tool built on TensorFlow that combines BiLSTM‑based domain classification, DSSM‑based intent matching, and a model‑fusion strategy to deliver accurate, multi‑type responses for intelligent customer service applications.

58 Tech
58 Tech
58 Tech
qa_match: An Open‑Source Deep Learning Based Question‑Answer Matching System

Open‑Source Project Series (Part 1)

Project name: qa_match GitHub: https://github.com/wuba/qa_match Description: qa_match is a lightweight question‑answer matching tool released by 58.com, leveraging deep learning to integrate domain and intent recognition for precise understanding of user queries.

Key Features (released March 2020)

Quickly build a QABot‑based QA system.

Developed on TensorFlow, offering high flexibility and strong extensibility.

Compact code with fast domain and intent recognition, achieving high accuracy.

Simplified training data format.

Supports saving the best‑performing model based on test‑set evaluation.

Effectively handles low‑information, word‑order variations, and semantically identical but differently phrased questions.

Why qa_match is Needed

QABot’s one‑question‑one‑answer service is the core of intelligent customer service, relying on a knowledge base built from manually curated, annotated, or machine‑extracted data. Queries must be matched to standard questions (intents) and their extended variations. Existing single‑model classification or retrieval methods do not perform well in complex, real‑world scenarios, prompting the development of a bagging‑style model‑fusion approach.

Algorithm Architecture

1. Domain Recognition – Uses a BiLSTM + attention model to classify queries into three high‑level domains: “Reject”, “List Answer”, and specific domains (e.g., account, information). The BiLSTM captures bidirectional context, while attention highlights salient features for precise domain detection.

2. Intent Recognition – Employs a Deep Structured Semantic Model (DSSM) to match queries to standard questions. Positive samples are generated by pairing a query with its standard question; negatives are random standard questions. During inference, query vectors are compared with pre‑computed document vectors via cosine similarity, selecting the top‑N candidates.

3. Model Fusion – Combines domain and intent results. When domain confidence is high, the corresponding intent candidates are retained, and final answers are chosen based on both domain and intent scores. Fusion thresholds are tuned on a validation set to maximize F1.

How to Train qa_match

Data preparation requires four files: domain‑label + question, intent‑label + question, standard‑question + question, and a mapping of domain‑label + intent‑label + domain + question. All questions are tokenized with spaces.

Running the project involves three steps—domain classification, intent matching, and model fusion—executed via the provided run.sh script.

Fusion parameters are selected by evaluating F1 scores across thresholds on a test set; the best‑performing thresholds are then used in production.

Future Plans

Release LSTM‑based pre‑trained models comparable to BERT in performance but with far fewer parameters.

Open a semi‑automatic knowledge‑base mining pipeline.

Provide TensorFlow 2.x and PyTorch versions of qa_match.

Contribution & Feedback

Developers are encouraged to submit PRs or Issues on the GitHub repository or email [email protected] for suggestions and bug reports.

Authors

Wang Yong – AI Lab Algorithm Architect at 58.com. Chen Lu – Senior Algorithm Engineer at 58.com.

AIdeep learningopen sourceDSSMBiLSTMquestion answeringqa_match
58 Tech
Written by

58 Tech

Official tech channel of 58, a platform for tech innovation, sharing, and communication.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.