Artificial Intelligence 16 min read

Designing and Deploying an NLP Model for Airline Ticket Customer Service

This article describes the end‑to‑end development of a multi‑class NLP system for Ctrip airline ticket customer service, covering problem analysis, data preprocessing, sample balancing, model architecture (TextCNN and Bi‑GRU), training strategies, performance evaluation, and online customization to achieve high accuracy in intent recognition.

Ctrip Technology

Nov 21, 2019

Designing and Deploying an NLP Model for Airline Ticket Customer Service

Background

Ctrip, a customer‑centric travel service provider, aims to improve the efficiency of its airline ticket online客服 by automatically summarizing a passenger's issue before a human agent takes over, thus reducing handling time and enhancing user experience.

Problem Definition

The task is a natural language processing (NLP) multi‑class classification problem with over 400 standard business intents and more than 300,000 chat records, exhibiting severe class imbalance (ratios up to 3000:1).

Text Pre‑processing

Chinese text is processed with simplified‑traditional conversion and word segmentation (recommended tools: HanLP, Jieba). English text undergoes case normalization, lemmatization, and typo correction. Special characters and emojis are filtered out while preserving punctuation for sentence segmentation.

Other Processing

Sample balancing is performed by oversampling minority classes with appropriate ratios. Texts are padded or truncated to a uniform length N determined from the length distribution of effective token sequences.

Model Framework

Two stages are involved: text vectorization and model architecture. Vectorization uses either bag‑of‑words or dense word embeddings (Word2Vec, GloVe, BERT). The model structures evaluated are TextCNN (filters=128, kernel sizes 3‑5) and Bi‑GRU (units=150), each with standard convolution, pooling, and fully‑connected layers.

Training Strategies

Static embeddings keep pre‑trained vectors fixed during training, suitable for small datasets. Dynamic embeddings allow the embedding matrix to be updated, yielding better performance on large datasets.

Model Performance

Training was performed on a single GPU. The results are shown in the table below:

Model

Main Structure

Training Time per Epoch

Accuracy

TextCNN

Filters=128, Kernel size=[3,4,5]

104 s

91.07 %

LSTM

Units=300

953 s

90.90 %

Bi‑LSTM

Units=150

1700 s

92.30 %

Bi‑GRU

Units=150

1440 s

91.29 %

Although the comparison is not strictly fair due to differing parameter counts, TextCNN offers the fastest training speed with competitive accuracy.

Online Customization

In production, the model must filter out meaningless utterances (e.g., greetings, thanks) using a combination of regex keyword matching and an “meaninglessness degree” algorithm based on edit‑distance similarity thresholds. Business‑specific adjustments, such as handling frequent intents like “ticket change” with contextual reasoning, raise the online accuracy from ~91 % to over 97 %.

Summary

Key take‑aways include maintaining a domain‑specific lexicon for segmentation, analyzing and rebalancing training samples, preferring dynamic embeddings, using TextCNN as a strong baseline, applying dropout to avoid over‑fitting, and continuously tailoring models to the airline ticket business scenario.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

deep learning Model Deployment Customer Service NLP text classification TextCNN Bi-GRU

Written by

Ctrip Technology

Official Ctrip Technology account, sharing and discussing growth.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.