Artificial Intelligence 15 min read

Improving Text Matching Accuracy in Voice Assistants: Experiments with Siamese Networks, BERT Models, and Advanced Tricks

This article evaluates classic Siamese networks, various BERT‑based pretrained models, and several training tricks such as adversarial training, k‑fold cross‑validation, and model ensembling on both a public similarity‑sentence competition dataset and an internal voice‑assistant standard question matching dataset, ultimately raising accuracy from 97.23 % to 99.5 %.

58 Tech

Jun 16, 2021

Improving Text Matching Accuracy in Voice Assistants: Experiments with Siamese Networks, BERT Models, and Advanced Tricks

Background – Natural language processing includes tasks like text matching, which is crucial for voice‑assistant dialogue management. The study uses a 40,000‑sample voice‑assistant dataset (balanced positive/negative, split 8:1:1) to benchmark classic Siamese architectures and modern pretrained models.

Experimental Design – The authors reproduced top solutions from the Alibaba DAMO‑Lab COVID‑sentence‑pair competition, testing four stages: (1) classic Siamese networks (SiameseLSTM, ABCNN, BiMPM, ESIM); (2) Chinese BERT‑style pretrained models (bert_wwm_ext, roberta_wwm_large, ERNIE); (3) text‑matching tricks (adversarial training, k‑fold cross‑validation, model ensemble); (4) comparison on both competition and voice‑assistant datasets.

Classic Siamese Networks

Strategy

Dataset

Accuracy

SiameseLSTM

Competition

0.8323

SiameseLSTM

Voice Assistant

0.9723

ABCNN

Competition

0.8528

ABCNN

Voice Assistant

0.9745

BiMPM

Competition

0.8916

BiMPM

Voice Assistant

0.9848

ESIM

Competition

0.9077

ESIM

Voice Assistant

0.9880

The progression from SiameseLSTM to ESIM shows steady gains, with attention mechanisms and richer matching strategies improving performance.

BERT Series Pretrained Models

Strategy

Dataset

Accuracy

bert_wwm_ext

Competition

0.9496

ernie

Competition

0.9499

roberta_wwm_large

Competition

0.9502

bert_wwm_ext

Voice Assistant

0.9938

ernie

Voice Assistant

0.9940

roberta_wwm_large

Voice Assistant

0.9930

All BERT‑style models significantly outperform classic Siamese networks, with the best competition accuracy of 95.02 % and voice‑assistant accuracy of 99.30 %.

Tricks for Further Improvement

Adversarial training (FGM, PGD) – adds small perturbations to embeddings, improving robustness.

K‑fold cross‑validation – trains multiple models on different folds and aggregates predictions.

Model ensemble (bagging) – combines predictions from multiple BERT variants.

Strategy

Dataset

Accuracy

ernie + external data + transfer

Competition

0.9559

ernie + external data + transfer + pgd

Competition

0.9576

ernie + external data + transfer + pgd

Voice Assistant

0.9948

ernie + 5‑fold CV

Competition

0.9515

Model ensemble (bert_wwm_ext, ernie, roberta_wwm_large)

Competition

0.9585

Model ensemble (same three models)

Voice Assistant

0.9950

These tricks consistently raise performance; the final ensemble reaches 95.85 % accuracy on the competition set and 99.50 % on the voice‑assistant set.

Conclusion – BERT‑based models are the dominant solution for Chinese text matching in voice assistants. Incorporating adversarial training, k‑fold validation, and ensemble methods yields further gains, demonstrating a practical roadmap for deploying high‑accuracy NLU components.

Author – Yin Zilong, AI Lab algorithm engineer at 58.com, specializing in voice data analysis and intelligent writing.

References

ABCNN: https://arxiv.org/pdf/1512.05193.pdf

BiMPM: https://arxiv.org/abs/1702.03814

ESIM: https://arxiv.org/abs/1609.06038

ERNIE 1.0: https://arxiv.org/pdf/1904.09223.pdf

Chinese BERT‑WWM: https://github.com/ymcui/Chinese-BERT-wwm

Adversarial training: https://zhuanlan.zhihu.com/p/91269728

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

BERT adversarial training Voice Assistant model ensemble text matching Siamese Network

Written by

58 Tech

Official tech channel of 58, a platform for tech innovation, sharing, and communication.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.