Artificial Intelligence 14 min read

Intelligent Hotel Post‑Sale QA System: Model Selection, Evaluation, and Engineering Optimization

This article describes the design, model selection, experimental evaluation, and engineering optimization of an AI‑driven post‑sale question‑answering system for hotel services, covering FAQ construction, intent detection, deep‑learning matching models such as DSSM, ESIM, BERT, and their performance and latency trade‑offs.

Qunar Tech Salon
Qunar Tech Salon
Qunar Tech Salon
Intelligent Hotel Post‑Sale QA System: Model Selection, Evaluation, and Engineering Optimization

The paper introduces a hotel post‑sale intelligent QA assistant built on Qunar's large‑scale travel platform, where thousands of daily users interact via chat and phone; the system organizes over 500 FAQ items, performs intent recognition, and routes queries to either self‑service or human agents to improve efficiency and user experience.

For the core QA task, the authors compare traditional classification approaches with retrieval‑based methods and several deep‑learning matching models, including DSSM, its variants, ESIM, and BERT‑based fine‑tuning. They adopt a baseline CNN‑LSTM model, enhance it with BERT for re‑ranking, and evaluate additional techniques such as ALBERT, TinyBERT, and Sentence‑BERT.

Experimental results on a balanced train/validation/test split show that the baseline + BERT pipeline achieves the highest accuracy (≈10 % absolute gain over baseline) with top‑1 recall improving by 26 % and top‑5 by 11 %, while maintaining reasonable robustness to newly added FAQ items. However, inference latency is a bottleneck: BERT requires >800 ms on CPU (≈160 ms on GPU) versus ~20 ms for the baseline, exceeding the 100 ms online budget.

To address latency, the authors explore GPU‑accelerated inference (TensorRT, FasterTransformer) and CPU‑side optimizations (cuBERT, model pruning, quantization, knowledge distillation). Despite attempts with reduced‑depth BERT, TinyBERT, and quant‑noise compression, speed gains were insufficient, leading them to adopt a Siamese Sentence‑BERT architecture with pooling to pre‑compute FAQ embeddings and achieve ~49 ms per query.

The study concludes that while deep models dramatically improve answer accuracy, practical deployment in large‑scale internet services requires careful trade‑offs between model performance and computational cost, and that lightweight, pre‑computed embedding solutions offer a viable path for real‑time intelligent QA.

model optimizationAIDSSMNLPBERTquestion answeringhotel services
Qunar Tech Salon
Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.