Artificial Intelligence 13 min read

A Survey of Text Classification and Intent Recognition: Industrial and Research Perspectives

This article reviews recent developments in text classification and intent recognition, comparing industrial practices such as business‑coupled feature engineering with research trends like pretrained language models, and provides references and practical insights for building effective NLP solutions.

DataFunTalk

Mar 17, 2022

A Survey of Text Classification and Intent Recognition: Industrial and Research Perspectives

Background

To update my technical approach, I surveyed recent advances in text classification and intent recognition, topics that are closely related yet rarely covered together in depth in either industry or academia.

Industrial Situation

Strong Business Coupling

In industry, intent recognition is tightly linked to business features; models must incorporate signals such as click‑through rates, query length, and other domain‑specific attributes, as demonstrated in Meituan and Tencent search systems.

These external features are combined with semantic models using techniques similar to wide&deep architectures, reflecting a blend of business information and language understanding.

Large‑scale models that fuse multiple representations, like the KDD21 Taobao vector search system, illustrate a “high‑model” approach that assembles diverse features, though such heavyweight solutions may be unnecessary for upstream intent tasks.

Semantic Understanding

Semantic models provide robust, generalizable understanding of user queries, handling misspellings and colloquial language, and can be modularly integrated with business rules for flexible engineering.

Despite their power, pretrained models such as BERT are not universally adopted in intent recognition due to cost‑benefit considerations; often simpler models or rule‑based methods suffice for many upstream tasks.

Search‑as‑Classification

The “search‑as‑classification” idea treats intent detection as a lookup problem, e.g., matching queries against a dictionary, which works well for sparse or rapidly changing categories.

Research Situation

Overview

Recent surveys (e.g., a 2020 review) compare shallow models (CNN, RNN) and attention‑based approaches, noting that pretrained models dominate benchmark leaderboards but may not reflect real‑world constraints.

Pretrained Model Dominance

Large pretrained models achieve state‑of‑the‑art results on standard datasets, yet their superiority can be dataset‑dependent; smaller models like TextCNN may outperform them on domain‑specific data.

Relying solely on benchmark performance can lead to suboptimal technology choices, emphasizing the need for task‑oriented data collection.

Other Text Classification Research

Studies explore attention‑focused CNNs, gating mechanisms to incorporate side information, and techniques such as R‑Dropout and adversarial training (FGM/PGD) to boost performance without heavy pretrained models.

Supplement

Additional high‑quality tutorials and code repositories for text classification are listed in the references.

Summary

Semantic understanding remains essential but should be made more universal and stable.

Business coupling can be achieved through feature engineering and rule‑based methods, not solely via deep models.

Pretrained models are not always necessary for intent recognition; downstream improvements often yield higher impact.

Current research trends focus on scenario‑specific challenges and specialized datasets.

Dataset design is a critical research direction for advancing text classification.

References

[1] Tencent Tech: Understanding Search Queries [2] DNN+GBDT Query Category Prediction Fusion Model [3] Daguan Data: User Search Intent Recognition [4] Meituan Search: Query Understanding [5] A Survey on Text Classification: From Shallow to Deep Learning [6‑8] Various Chinese blog translations of the survey [9] Lite Transformer with Long‑Short Range Attention [10] 2021 AAAI Text Classification Papers [11] ACT: An Attentive Convolutional Transformer for Efficient Text Classification [12] Merging Statistical Feature via Adaptive Gate for Improved Text Classification [13] Task‑Aware Representation of Sentences for Generic Text Classification [14] How to Fine‑Tune BERT for Text Classification? [15‑17] Code repositories and articles on Chinese text classification

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

NLP intent recognition pretrained models text classification industry applications research survey

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.