Improving Text Representation and Clustering for Small‑Sample Scenarios in 58.com Used‑Car Customer Service with a Bi‑LSTM Pre‑trained Language Model and Deep Clustering
This article presents a study on enhancing text representation and clustering purity in the small‑sample 58.com used‑car customer‑service scenario by introducing a Bi‑LSTM based pre‑trained language model and an improved Deep Embedded Clustering (DEC) algorithm, demonstrating significant gains in accuracy, silhouette score, and answer‑rate.
Background – 58.com’s intelligent customer‑service system ("Bangbang") has been deployed across various business lines since 2017. In 2019 it was extended to C‑end users and B‑end merchants, forming the "Bangbang Merchant" version. The system saves hundreds of staff and improves efficiency, but the small‑sample nature of the used‑car domain leads to weak text representation and low clustering purity.
Problem Statement – Two key challenges were identified: (1) how to obtain robust representations for queries in a small‑sample setting to capture diverse phrasings of the same intent, and (2) how to discover new user questions to improve coverage of the automated QA robot.
Bi‑LSTM Pre‑trained Language Model – Inspired by BERT’s masked‑LM task, a Bi‑LSTM encoder was trained on 40 million unlabeled sentences from the used‑car domain, retaining only the masked‑LM objective. To reduce computational cost, Bi‑LSTM replaced the Transformer, and residual‑add‑norm blocks were added. The model was trained for 300 k iterations on a single NVIDIA TESLA P40 GPU (≈28 h). Evaluation on a 26 k‑sample classification task showed accuracy improvement from 0.81 to 0.86, outperforming a generic BERT Chinese model (0.8487 → 0.8662).
Model
Bi‑LSTM
BERT
+Pretrain model Acc
0.8662
0.8487 (5 epoch) / 0.8530 (10 epoch)
No Pretrain Model Acc
0.8107
0.7884 (5 epoch) / 0.8342 (10 epoch)
DEC Algorithm Description – DEC jointly learns feature representations and cluster assignments. It consists of two stages: (1) pre‑training an auto‑encoder to obtain initial embeddings, and (2) fine‑tuning the encoder together with soft cluster assignments by minimizing KL‑divergence between the learned distribution q and a target distribution p. The original DEC uses K‑means for initializing centroids.
Improvements to DEC – To better suit the small‑sample scenario, the authors replaced K‑means centroids with custom centroids computed as the mean vectors of all known expanded question variants for each standard question. This guides the clustering toward the manually curated distribution.
Experiments
Three experiments were conducted on a manually labeled small‑sample dataset:
K‑means + Word2Vec static representation
K‑means + Bi‑LSTM static representation
DEC + Bi‑LSTM pre‑trained model (dynamic representation)
accuracy
silhouette
runtime
K‑means + Word2Vec
0.354
0.047
<5 min
K‑means + Bi‑LSTM
0.377
0.025
<5 min
DEC + Bi‑LSTM
0.8437
0.142
30 min
The DEC‑based approach achieved a much higher accuracy (0.8437) and silhouette score (0.142) than the K‑means baselines, albeit with longer runtime. Online evaluation showed the weekly answer‑rate improved from 79.71 % to 83.62 % after iteration.
Metric
Before Iteration
After Iteration
Answer‑rate
79.71 %
83.62 %
For the expanded‑question discovery task, the improved DEC increased precision from 98.11 % to 98.24 % and recall from 89.66 % to 92.27 %.
Precision
Recall
Before
98.11 %
89.66 %
After
98.24 %
92.27 %
Conclusion & Outlook – By adapting the pre‑training task to the vertical domain and customizing DEC centroids, the authors achieved notable improvements in text representation, clustering purity, and downstream QA performance. Future work includes exploring transfer learning between online/offline data, designing more suitable representation networks, and incorporating self‑supervised tasks.
References
Xie, J., Girshick, R., & Farhadi, A. (2016). Unsupervised deep embedding for clustering analysis. ICML.
Aggarwal, C. C., & Zhai, C. (2012). A survey of text clustering algorithms. Mining Text Data.
Aljalbout, E., et al. (2018). Clustering with deep learning: Taxonomy and new methods. arXiv:1801.07648.
Devlin, J., et al. (2018). BERT: Pre‑training of deep bidirectional transformers for language understanding. arXiv:1810.04805.
Vaswani, A., et al. (2017). Attention is all you need. NIPS.
58 Tech
Official tech channel of 58, a platform for tech innovation, sharing, and communication.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.