Artificial Intelligence 10 min read

qa_match V1.1: Upgraded Lightweight Deep Learning QA Matching Tool

The article introduces qa_match V1.1, an open‑source, Apache‑licensed lightweight question‑answer matching system that adds a simple pre‑trained language model (SPTM), supports one‑level knowledge bases, details model architecture, training resources, performance benchmarks, future plans, and contribution guidelines.

58 Tech

Jun 5, 2020

qa_match V1.1: Upgraded Lightweight Deep Learning QA Matching Tool

Open‑Source Project Series (Part 11)

Project Name: qa_match GitHub: https://github.com/wuba/qa_match License: Apache License 2.0

qa_match is a lightweight, deep‑learning‑based QA matching tool released by 58.com. Version 1.0 was launched on March 9, 2020; version 1.1 adds the following features:

Uses a lightweight pre‑trained language model (SPTM, Simple Pre‑trained Model) for QA matching.

Supports one‑level structured knowledge bases (in addition to the original two‑level support), improving generality.

Why Upgrade to V1.1

V1.0 only supported two‑level knowledge bases. V1.1 introduces one‑level support and releases a Bi‑LSTM‑based pre‑trained language model to boost downstream QA tasks.

QA Knowledge Base Introduction

Knowledge bases are built via manual summarization, annotation, or automated mining. A one‑level KB contains standard questions (intents) and their paraphrases (expanded questions). A two‑level KB adds a domain layer grouping multiple intents. The article includes a diagram comparing the two structures.

Lightweight Pre‑trained Language Model (SPTM)

To handle abundant unlabeled data, SPTM follows the BERT pre‑training paradigm but removes the Next Sentence Prediction (NSP) task and replaces the Transformer encoder with LSTM for faster inference. The model architecture includes residual Bi‑LSTM layers that sum the input and output of each layer before feeding to the next.

Pre‑training details:

Dataset size: 10 Million sentences

Hardware: Nvidia P40 (12 GB)

Steps: 500 000, batch size 256

Training time: 215.69 hours

QA Matching

1. Two‑level KB automatic QA (V1.0) – combines a LSTM domain classifier with a DSSM intent‑matching model (see the original article).

2. One‑level KB automatic QA (V1.1)

Using the existing DSSM intent‑matching model.

Using the fine‑tuned SPTM model: the same architecture as pre‑training but without masking; the target is the ID of the standard question. Scoring follows the same strategy as DSSM to decide answer type.

Both approaches use a scoring threshold (x1, x2) to select answer types (single answer, list answer, or reject).

Effectiveness Examples

Evaluation on first‑level and second‑level KB datasets shows the following offline metrics and CPU inference latency:

Dataset

Model

Unique‑Answer Accuracy

Unique‑Answer Recall

Unique‑Answer F1

CPU Inference Time

First‑level KB

DSSM

0.8398

0.8326

0.8362

3 ms

First‑level KB

SPTM

0.8841

0.9002

0.8921

16 ms

Second‑level KB

LSTM+DSSM Fusion

0.8957

0.9027

0.8992

18 ms

Because list‑answer cases are rare in the demo data, the focus is on unique‑answer metrics.

Future Plans

Develop a semi‑automatic knowledge‑base mining pipeline combining human and machine methods.

Release TensorFlow 2.X or PyTorch versions of qa_match as needed.

How to Contribute & Provide Feedback

We welcome developers to submit PRs or Issues at https://github.com/wuba/qa_match.git or email [email protected] .

Authors

He Rui – Senior Algorithm Engineer, AI Lab, 58.com Wang Yong – Algorithm Architect, AI Lab, 58.com Chen Lu – Senior Algorithm Engineer, AI Lab, 58.com

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI deep learning Open Source Knowledge Base pretrained language model question answering qa_match

Written by

58 Tech

Official tech channel of 58, a platform for tech innovation, sharing, and communication.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.