Intelligent Customer Service at Meituan: Dialogue Framework, Intent Understanding, Knowledge Discovery, and Emotion Recognition
Meituan's intelligent customer service leverages a dialogue interaction framework with machine‑learning‑driven intent mining, understanding, emotion detection, dialogue management, knowledge discovery, and multi‑task learning to handle both simple QA and complex multi‑turn tasks across its diverse service ecosystem.
Intelligent customer service is an AI system that interacts with users in natural language, analyzing intent to provide personalized assistance.
1. Dialogue Framework
When a user enters the service interface, the problem is first understood and then routed to the appropriate backend service. The framework consists of two main parts: offline training and knowledge‑base construction, and online real‑time processing.
The online pipeline extracts basic features such as tokenization, semantic tags, sentiment analysis, and NER, then proceeds to intent understanding (domain classification, intent detection, and slot extraction). After intent is identified, dialogue management handles state tracking (DST) and decision making, finally invoking business service APIs.
2. Intent Understanding
Meituan serves many business scenarios. Users may enter a specific service window (clear domain) or the general portal (ambiguous domain). The system first performs domain classification, then intent classification, handling both single‑turn QA and multi‑turn task‑oriented dialogs.
Example of single‑turn QA:
U: Meituan delivery time
S: Delivery follows the merchant's business hours. If the merchant is open, ordering and delivery are available.
Example of multi‑turn task:
How to become a Meituan merchant?
Task‑oriented dialogs collect required slots and invoke corresponding APIs.
2.1 Domain Classification
Large volumes of business data are collected for training. Challenges include noisy labels (cross‑domain queries) and overlapping domains. An active‑learning loop is used: data collection → labeling → model training → prediction → human correction → retraining.
Experiments show TextCNN runs within 10 ms for a ~15‑character query, while BERT takes ~70 ms; TextCNN is used in production.
2.2 Intent Classification
Two categories: QA‑type intent (handled by retrieval and similarity ranking) and task‑type intent (handled by rule‑based grammars and machine‑learning models). Multi‑task learning jointly models intent classification and slot filling using a Bi‑LSTM with a CRF layer.
Example: "Help me find a suitable Sichuan restaurant for 10 people tomorrow noon" → intent = reservation, slots = time, party size, cuisine.
2.3 Dialogue State Tracking (DST)
DST uses the session context and NLU outputs (potentially multiple intents) together with auxiliary information (order, portal entry) to disambiguate the current domain and intent. If the domain is unclear, the system asks a clarification question.
Example: "Beef soup spilled" → domain = delivery, intent = "apply for food‑damage compensation", then follow the compensation workflow.
3. Knowledge Discovery
The system combines supervised NLU with unsupervised clustering to extract new knowledge from logs. Human operators review and approve the generated knowledge.
3.1 Human‑in‑the‑Loop
When AI cannot resolve an issue, the conversation is handed over to a human agent. Unsupervised clustering (K‑means) groups similar queries, and the resulting clusters are used to build task trees.
Task trees (e.g., "How to apply for food‑damage compensation") define required slots such as delivery method, damage reason, and application status, which are collected via dialog or backend APIs.
4. Emotion Recognition
Voice data from the customer hotline is processed to extract features (FFT, Mel‑filterbanks) and fed into models (LR, SVM, CNN, LSTM, VGGish with attention) to predict emotional states. Weak‑label learning addresses the problem that a single label may not reflect moment‑by‑moment emotion changes.
Model performance ranking: MFCC+LSTM < MFCC+CNN < VGGish+feature‑level attention < VGGish+decision‑level attention.
5. Outlook
Multi‑turn context modeling and intent recommendation.
Multimodal voice‑text fusion, weak‑label learning, and emotion risk detection.
Topic extraction from dialog history, script recommendation, and agent assistance.
The above components aim to optimize both the user‑side (ToC) and the agent‑side (ToB) of Meituan's intelligent customer service loop.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.