Artificial Intelligence 7 min read

Real-time Product Matching and User Profiling System for Personalized Item Selection

The paper introduces a product‑matching and user‑profiling system that builds themed collections by comparing new items to cold‑start samples using a two‑stage similarity pipeline—exact edit‑distance and pHash checks followed by doc2vec and OCR‑based embeddings—and then profiles sellers with RFM and clustering to highlight attributes like recent C2C sales volume, achieving about 80 % precision in a license‑plate bidding scenario while outlining future fusion improvements.

Xianyu Technology

Sep 2, 2021

Real-time Product Matching and User Profiling System for Personalized Item Selection

Background: Operations and product teams need to select items and target user groups for promotional activities. Traditional methods rely on metric‑based or feature‑based selection and basic demographic profiling.

To support more personalized selection, a unified matching approach is adopted, focusing on system and algorithm design.

System design : Built on a real‑time computation platform, the system creates themed product collections with cold‑start samples. When new items are posted, their titles, descriptions, and images are compared to the samples; matches are added back to the collection, forming a feedback loop.

Algorithm design : Two‑stage similarity detection. First, exact similarity checks using edit distance for text and perceptual hash (pHash) for images; if both distances are below 10% the items are considered identical. Otherwise, semantic similarity is evaluated: text is embedded with a doc2vec model and images are OCR‑extracted and also embedded; cosine similarity > 0.8 on any modality triggers a match.

User profiling : For matched items, seller attributes (age, gender, purchase power) and behavior metrics (RFM) are collected. Feature relevance analysis, clustering, and decision‑tree slicing identify key attributes such as recent C2C sales volume, which distinguishes professional sellers.

Results show ~80% precision for the “license‑plate bidding” scenario and reveal that sellers with higher recent sales are more likely to publish related services.

Future work includes improving multimodal feature extraction and exploring deeper multimodal fusion for higher matching accuracy.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Machine Learning user profiling product selection real-time matching Similarity Detection

Written by

Xianyu Technology

Official account of the Xianyu technology team

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.