Artificial Intelligence 16 min read

UGC Sentiment Analysis Solutions and Applications in Taobao

This article presents a comprehensive overview of Taobao's user‑generated content sentiment analysis pipeline, covering task definition, challenges, model architecture with RoBERTa‑based extraction, sentiment‑knowledge pre‑training, graph augmentation, personalized ranking, business impact metrics, and future research directions.

DataFunTalk
DataFunTalk
DataFunTalk
UGC Sentiment Analysis Solutions and Applications in Taobao

Taobao generates massive daily user comments, making it difficult for shoppers to browse all reviews; therefore, efficiently understanding user opinions through UGC sentiment analysis and summarizing attribute‑level sentiment is essential.

The UGC sentiment task is defined as extracting triples of (attribute, sentiment word, sentiment polarity) from comments, e.g., identifying "fabric" as the attribute, "nice" as the sentiment word, and labeling it as positive.

Key challenges include domain‑specific attribute variations, long‑tail expressions, sentiment polarity shifts across categories, severe class imbalance for negative samples, and cross‑domain inconsistencies.

The proposed pipeline uses supervised training: comments are fed to an attribute‑and‑sentiment‑word extraction model, followed by normalization to align synonymous attribute‑sentiment pairs, classification of polarity, viewpoint generation, aggregation, and active learning for continuous improvement.

The extraction model employs a RoBERTa backbone continuously pre‑trained on e‑commerce data, a BiLSTM layer for richer sequence features, a domain‑expert network (MMOE) with shared and private attributes, attention for dynamic feature selection, and a CRF layer for label consistency, achieving an F1 score of 0.8668.

To incorporate sentiment knowledge, the pre‑training adds category embeddings, sentiment masking, and a multi‑task loss covering attribute‑sentiment prediction, MLM, sentiment‑word prediction, category prediction, sentence‑level polarity, and POS tagging, boosting macro‑F1 from 0.9306 to 0.9543.

A sentiment knowledge graph is constructed by linking similar attributes, opinion words, and attribute‑sentiment pairs using static embeddings and KNN; graph‑enhanced pre‑training further improves long‑tail and negative case performance, raising F1 from ~0.91 to ≥0.93.

For personalized impression‑word ranking, a DIN model incorporates user demographics, item price and category, interaction history, and extracted impression words as features.

Business experiments show significant gains: impression‑word CTR ↑456%, UCTR ↑250%; search result CTR ↑2.7%; mini‑detail page PV click‑through ↑2.46%; short‑video full‑screen PV click‑through ↑2.61%, among others.

Future work aims to achieve perfect negative‑sentiment detection, handle complex multi‑entity expressions, and develop an end‑to‑end (aspect, opinion word, sentiment polarity) triple extraction model with rule integration and attention enhancements.

e-commercepersonalizationDeep Learningsentiment analysisUGCKnowledge Graphpretrained models
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.