qa_match V1.1: Upgraded Lightweight Deep Learning QA Matching Tool
The article introduces qa_match V1.1, an open‑source, Apache‑licensed lightweight question‑answer matching system that adds a simple pre‑trained language model (SPTM), supports one‑level knowledge bases, details model architecture, training resources, performance benchmarks, future plans, and contribution guidelines.
Open‑Source Project Series (Part 11)
Project Name: qa_match GitHub: https://github.com/wuba/qa_match License: Apache License 2.0
qa_match is a lightweight, deep‑learning‑based QA matching tool released by 58.com. Version 1.0 was launched on March 9, 2020; version 1.1 adds the following features:
Uses a lightweight pre‑trained language model (SPTM, Simple Pre‑trained Model) for QA matching.
Supports one‑level structured knowledge bases (in addition to the original two‑level support), improving generality.
Why Upgrade to V1.1
V1.0 only supported two‑level knowledge bases. V1.1 introduces one‑level support and releases a Bi‑LSTM‑based pre‑trained language model to boost downstream QA tasks.
QA Knowledge Base Introduction
Knowledge bases are built via manual summarization, annotation, or automated mining. A one‑level KB contains standard questions (intents) and their paraphrases (expanded questions). A two‑level KB adds a domain layer grouping multiple intents. The article includes a diagram comparing the two structures.
Lightweight Pre‑trained Language Model (SPTM)
To handle abundant unlabeled data, SPTM follows the BERT pre‑training paradigm but removes the Next Sentence Prediction (NSP) task and replaces the Transformer encoder with LSTM for faster inference. The model architecture includes residual Bi‑LSTM layers that sum the input and output of each layer before feeding to the next.
Pre‑training details:
Dataset size: 10 Million sentences
Hardware: Nvidia P40 (12 GB)
Steps: 500 000, batch size 256
Training time: 215.69 hours
QA Matching
1. Two‑level KB automatic QA (V1.0) – combines a LSTM domain classifier with a DSSM intent‑matching model (see the original article).
2. One‑level KB automatic QA (V1.1)
Using the existing DSSM intent‑matching model.
Using the fine‑tuned SPTM model: the same architecture as pre‑training but without masking; the target is the ID of the standard question. Scoring follows the same strategy as DSSM to decide answer type.
Both approaches use a scoring threshold (x1, x2) to select answer types (single answer, list answer, or reject).
Effectiveness Examples
Evaluation on first‑level and second‑level KB datasets shows the following offline metrics and CPU inference latency:
Dataset
Model
Unique‑Answer Accuracy
Unique‑Answer Recall
Unique‑Answer F1
CPU Inference Time
First‑level KB
DSSM
0.8398
0.8326
0.8362
3 ms
First‑level KB
SPTM
0.8841
0.9002
0.8921
16 ms
Second‑level KB
LSTM+DSSM Fusion
0.8957
0.9027
0.8992
18 ms
Because list‑answer cases are rare in the demo data, the focus is on unique‑answer metrics.
Future Plans
Develop a semi‑automatic knowledge‑base mining pipeline combining human and machine methods.
Release TensorFlow 2.X or PyTorch versions of qa_match as needed.
How to Contribute & Provide Feedback
We welcome developers to submit PRs or Issues at https://github.com/wuba/qa_match.git or email [email protected] .
Authors
He Rui – Senior Algorithm Engineer, AI Lab, 58.com Wang Yong – Algorithm Architect, AI Lab, 58.com Chen Lu – Senior Algorithm Engineer, AI Lab, 58.com
58 Tech
Official tech channel of 58, a platform for tech innovation, sharing, and communication.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.