Artificial Intelligence 16 min read

UGC Sentiment Analysis Solutions and Applications in Taobao

This article presents a comprehensive overview of Taobao's user‑generated content sentiment analysis pipeline, covering task definition, challenges, model architecture with RoBERTa‑based extraction, sentiment‑knowledge pre‑training, graph augmentation, personalized ranking, business impact metrics, and future research directions.

DataFunTalk

Feb 2, 2022

UGC Sentiment Analysis Solutions and Applications in Taobao

Taobao generates massive daily user comments, making it difficult for shoppers to browse all reviews; therefore, efficiently understanding user opinions through UGC sentiment analysis and summarizing attribute‑level sentiment is essential.

The UGC sentiment task is defined as extracting triples of (attribute, sentiment word, sentiment polarity) from comments, e.g., identifying "fabric" as the attribute, "nice" as the sentiment word, and labeling it as positive.

Key challenges include domain‑specific attribute variations, long‑tail expressions, sentiment polarity shifts across categories, severe class imbalance for negative samples, and cross‑domain inconsistencies.

The proposed pipeline uses supervised training: comments are fed to an attribute‑and‑sentiment‑word extraction model, followed by normalization to align synonymous attribute‑sentiment pairs, classification of polarity, viewpoint generation, aggregation, and active learning for continuous improvement.

The extraction model employs a RoBERTa backbone continuously pre‑trained on e‑commerce data, a BiLSTM layer for richer sequence features, a domain‑expert network (MMOE) with shared and private attributes, attention for dynamic feature selection, and a CRF layer for label consistency, achieving an F1 score of 0.8668.

To incorporate sentiment knowledge, the pre‑training adds category embeddings, sentiment masking, and a multi‑task loss covering attribute‑sentiment prediction, MLM, sentiment‑word prediction, category prediction, sentence‑level polarity, and POS tagging, boosting macro‑F1 from 0.9306 to 0.9543.

A sentiment knowledge graph is constructed by linking similar attributes, opinion words, and attribute‑sentiment pairs using static embeddings and KNN; graph‑enhanced pre‑training further improves long‑tail and negative case performance, raising F1 from ~0.91 to ≥0.93.

For personalized impression‑word ranking, a DIN model incorporates user demographics, item price and category, interaction history, and extracted impression words as features.

Business experiments show significant gains: impression‑word CTR ↑456%, UCTR ↑250%; search result CTR ↑2.7%; mini‑detail page PV click‑through ↑2.46%; short‑video full‑screen PV click‑through ↑2.61%, among others.

Future work aims to achieve perfect negative‑sentiment detection, handle complex multi‑entity expressions, and develop an end‑to‑end (aspect, opinion word, sentiment polarity) triple extraction model with rule integration and attention enhancements.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

deep learning Sentiment Analysis UGC knowledge graph pretrained models

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.