Exploring Text Pre‑training Models for Dialogue Classification in Information Security: From TextCNN to RoBERTa and Knowledge Distillation
This article presents a systematic exploration of text pre‑training models for dialogue classification in information‑security scenarios, comparing baseline TextCNN, an enhanced TextCNN_role, RoBERTa with domain‑adaptive pre‑training, and a distilled mini‑model, and discusses their performance, trade‑offs, and future directions.
The paper introduces the use of text pre‑training models for information‑security tasks, focusing on dialogue text classification for user‑reported content review. It outlines the problem of identifying the roles of participants in rental‑related conversations and frames it as a multi‑label classification task.
As a baseline, a TextCNN model is employed, with role tokens ([Ra] for the browsing side and [Rb] for the posting side) appended to each utterance to inject speaker information. This model achieves a macro‑average F1 of 87.3 on a 20k training set.
To improve role awareness, the authors enhance TextCNN by adding role embeddings (TextCNN_role), raising the macro‑average F1 to 88.4, a 1.1‑point gain.
Moving to transformer‑based models, RoBERTa is fine‑tuned on the same data, reaching an F1 of 89.3, outperforming the improved TextCNN by 0.9 points. Recognizing the domain gap, they apply domain‑adaptive pre‑training (DAPT) and task‑adaptive pre‑training (TAPT) on rental‑dialogue data, producing RoBERTa_58Dialog, which further lifts F1 by 4.5‑4.7 points.
Because large PTMs incur high inference latency, the authors perform knowledge distillation, training a 4‑layer student model (58Dialog_mini) from the 12‑layer RoBERTa_58Dialog teacher. The distilled model loses only 1.7 points in F1 but reduces inference time from 13.75 ms to 5.92 ms (≈57 % speed‑up).
The article concludes with a summary of all model results, suggests exploring lighter architectures such as ALBERT or ELECTRA, expanding DAPT/TAPT data, and further optimizing distillation to balance accuracy and efficiency.
58 Tech
Official tech channel of 58, a platform for tech innovation, sharing, and communication.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.