Artificial Intelligence 11 min read

Interest Tagging System for Xianyu: Data‑Driven User Profiling

The Xianyu interest‑tagging system profiles post‑95 users by matching expert and hot‑search keywords to product text, weighting user actions with a TF‑IDF‑based behavior‑statistics pipeline, producing over twenty tags that cover more than half the target cohort and have already doubled click‑through rates for interest‑aligned live streams.

Xianyu Technology
Xianyu Technology
Xianyu Technology
Interest Tagging System for Xianyu: Data‑Driven User Profiling

Background: On Xianyu, about 30% of users are under 25. Understanding the interests of this post‑95 generation is crucial for fine‑grained operation. The goal is to build a distinctive interest‑tag system using data mining.

Challenges: Interest expressions are highly flexible and often unique (e.g., niche hobbies, broad styles). Capturing them accurately and efficiently is the main difficulty.

Key considerations: (1) Flexible expression – interests cross work, study, lifestyle and cannot be fully described by structured labels. (2) Uniqueness – many interests are identified by proprietary terms (e.g., “JK series”, “blind‑box series”). (3) Rapid, batch‑able construction is needed for a 0‑to‑1 scenario.

Methodology: Two common tag‑generation approaches were examined – model prediction and behavior statistics. Model prediction offers high accuracy but requires large labeled samples and incurs high computational cost. Behavior statistics are simple, interpretable, but lack generalization.

Chosen solution: A behavior‑statistics‑centric pipeline with customizable tags. Keywords are matched against product/content text to retrieve relevant items; user actions on these items (view, collect, interact, purchase) are weighted by type, frequency, and recency to compute a user‑item score. For long‑tail interests, TF‑IDF is applied to combine keyword relevance with overall user activity, treating each interest as a “term” and each user as a “document”.

Implementation details: TF‑IDF = TF * IDF where TF reflects the frequency of user‑item interactions and IDF reflects the rarity of the interest across all users. The resulting score identifies users with genuine interest in a domain.

Keyword selection: A mix of “typical keywords” supplied by product experts and hot‑search terms derived from recent user behavior. For broad domains, leaf‑category names from the internal taxonomy are used, and related terms are expanded via association graphs.

Results: Phase 1 produced 20+ tags covering popular post‑95 interests (JK, lolita, hanfu, anime, etc.). Over 50% of the target cohort received at least one interest tag, enabling precise audience segmentation. Applying the tags to live‑stream recommendation doubled click‑through rates for interest‑aligned streams and boosted exposure for long‑tail creators.

Future work: Automate keyword discovery, incorporate semantic vector representations, and enrich user behavior signals from community, local, and entertainment scenarios.

e-commercedata mininguser profilingbehavior analyticsinterest taggingTF-IDF
Xianyu Technology
Written by

Xianyu Technology

Official account of the Xianyu technology team

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.