Artificial Intelligence 17 min read

How Vipshop Built an AI‑Powered Sentiment Analysis System for Real‑Time Customer Feedback

Vipshop's in‑house sentiment monitoring platform integrates web‑scraped reviews, WeChat comments and internal service messages, applying lexical sentiment scoring, dictionary‑based Chinese word segmentation, TF‑IDF keyword ranking and lightweight classification to deliver real‑time insights, alerts and actionable reports for thousands of daily user comments.

Vipshop Quality Engineering
Vipshop Quality Engineering
Vipshop Quality Engineering
How Vipshop Built an AI‑Powered Sentiment Analysis System for Real‑Time Customer Feedback

Background and Current Situation

Vipshop, a fast‑growing Chinese e‑commerce platform, receives massive user comments and feedback daily from Weibo, WeChat, forums, app stores and internal customer service. These opinions contain valuable suggestions, experience feedback and demand signals, but current collection and analysis suffer from fragmented sources, high manual cost, slow warning, and poor long‑term monitoring.

Comments are scattered across channels; manual crawling is slow and expensive.

Large volume of unstructured comments makes classification and semantic analysis inefficient.

Issue alerts are delayed, causing losses before detection.

Long‑term monitoring is ineffective, resulting in poor readability and user experience.

Overall Introduction

Existing commercial monitoring systems (e.g., Tencent Penguin Wind, Baidu Yiqing, Qimai Data) can crawl app‑store, forum and Weibo comments but are limited in data sources and customization. Vipshop therefore developed its own sentiment monitoring system to integrate richer data sources and custom functions.

Current data sources include app‑store reviews, WeChat public account comments and internal customer‑service messages, which are persisted for later text‑mining analysis.

System processes the collected comments through sentiment judgment, word segmentation, classification, and word‑frequency analysis. The architecture consists of three modules: data collection, analysis, and application. This article focuses on the analysis module.

System Architecture

The overall architecture includes the three core modules mentioned above. Detailed implementation of sentiment analysis, text segmentation, word‑frequency analysis and classification is described below.

Sentiment Analysis

Traditional sentiment classification relies on a sentiment lexicon. Our approach builds a three‑part lexicon (positive, negative, and interference words) and assigns weights to each term. Positive words receive a positive weight, negative words a negative weight, and interference words are excluded. The polarity of a comment is computed by linearly summing the weighted scores and comparing against a threshold.

Text Segmentation

Chinese word segmentation is performed using a dictionary‑based N‑shortest‑path algorithm (similar to the DAG approach of NLPIR). The algorithm constructs a directed acyclic graph of possible words, assigns probabilities (e.g., TF‑IDF) as edge weights, and selects the N shortest paths as segmentation results.

Word‑Frequency Analysis

Based on the segmentation results, the system calculates term frequency (TF) and inverse document frequency (IDF) to identify keywords whose occurrence rises significantly between consecutive periods. The workflow computes period‑wise word proportions, selects the top‑n terms, and ranks them by differential increase, producing hotspot keyword lists.

Classification Analysis

Comments are automatically classified into predefined categories (e.g., purchase flow, payment, product quality, logistics, membership, marketing) using a lightweight keyword‑weight algorithm. The algorithm sums the weights of matched keywords per category and selects the category with the highest total weight, achieving over 90 % accuracy.

Practical Effects

Since launch, the system has served nearly 300 internal users across product, development, testing and finance teams, helping them understand user needs, improve experience, and detect online issues such as app crashes, membership‑club access problems, and customer‑service failures.

Future Plans

Upcoming iterations will add keyword‑frequency alerting via email, allowing users to subscribe to specific terms and receive real‑time notifications of emerging issues.

e-commerceBig Datamachine learningsentiment analysisNLPtext mining
Vipshop Quality Engineering
Written by

Vipshop Quality Engineering

Technology exchange and sharing for quality engineering

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.