Real-time Product Matching and User Profiling System for Personalized Item Selection
The paper introduces a product‑matching and user‑profiling system that builds themed collections by comparing new items to cold‑start samples using a two‑stage similarity pipeline—exact edit‑distance and pHash checks followed by doc2vec and OCR‑based embeddings—and then profiles sellers with RFM and clustering to highlight attributes like recent C2C sales volume, achieving about 80 % precision in a license‑plate bidding scenario while outlining future fusion improvements.
Background: Operations and product teams need to select items and target user groups for promotional activities. Traditional methods rely on metric‑based or feature‑based selection and basic demographic profiling.
To support more personalized selection, a unified matching approach is adopted, focusing on system and algorithm design.
System design : Built on a real‑time computation platform, the system creates themed product collections with cold‑start samples. When new items are posted, their titles, descriptions, and images are compared to the samples; matches are added back to the collection, forming a feedback loop.
Algorithm design : Two‑stage similarity detection. First, exact similarity checks using edit distance for text and perceptual hash (pHash) for images; if both distances are below 10% the items are considered identical. Otherwise, semantic similarity is evaluated: text is embedded with a doc2vec model and images are OCR‑extracted and also embedded; cosine similarity > 0.8 on any modality triggers a match.
User profiling : For matched items, seller attributes (age, gender, purchase power) and behavior metrics (RFM) are collected. Feature relevance analysis, clustering, and decision‑tree slicing identify key attributes such as recent C2C sales volume, which distinguishes professional sellers.
Results show ~80% precision for the “license‑plate bidding” scenario and reveal that sellers with higher recent sales are more likely to publish related services.
Future work includes improving multimodal feature extraction and exploring deeper multimodal fusion for higher matching accuracy.
Xianyu Technology
Official account of the Xianyu technology team
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.