Video Clustering Techniques for Personalized Recommendation in Meipai
Meipai’s personalized recommendation system leverages massive user‑behavior data to build behavior‑driven video clusters—evolving from TopicModel through Item2vec and Keyword Propagation to a DSSM deep model—boosting ranking AUC, enhancing UI diversity, similar‑video search, niche discovery, and feature engineering.
In Meipai's personalized recommendation system, user behavior is leveraged to understand video content and construct clustering models, moving beyond simple tag statistics derived from item attributes.
The ninth Meitu Tech Salon featured a presentation by Bai Yang on a behavior‑driven video clustering solution and its practical applications in Meipai.
Application scenarios of video clustering:
UI layout: when two videos belonging to the same cluster appear among the six videos displayed on the homepage, a "cluster dispersion" operation is performed to avoid showing similar videos consecutively, thereby increasing result diversity.
Similar‑video retrieval: clustering enables fast identification of videos belonging to the same cluster for similarity search.
Discovery of niche or short‑term hot videos, aiding product operation strategies.
Extension of recommendation strategies: clusters are used to recall videos that match user‑interested clusters.
Feature engineering: cluster IDs are added as features to the ranking model, significantly improving AUC.
Traditional content‑based methods (visual, audio, textual, key‑frame analysis) have limitations such as the need for prior knowledge, incomplete coverage, and inaccurate descriptions. Therefore, Meipai adopts a user‑behavior‑based approach, where user and video portraits are derived from clicks, plays, likes, etc.
Key challenges:
Massive data volume (terabytes‑scale daily user behavior).
Rapid model updates required for newly uploaded videos.
Interpretability of clusters (e.g., understanding that a cluster represents "food & beauty").
Evolution of clustering solutions:
1. TopicModel – a classic NLP topic model applied to user‑behavior documents (each user’s watched videos form a document). It provides coarse‑grained clusters, easy interpretability, but requires weeks of data and suffers from granularity issues. Evaluation metrics such as perplexity and topic coherence are used, and the resulting clusters are fed into the ranking model.
2. Item2vec – a Word2vec‑style embedding where videos are treated as words and user sessions as sentences. It yields finer‑grained clusters and can be integrated end‑to‑end, but suffers from instability (cluster IDs change across trainings) and lower accuracy for low‑frequency videos.
3. Keyword Propagation – a semi‑supervised graph‑based method. Nodes represent videos; edges are weighted by co‑view counts. Initially, videos with reliable textual keywords keep them, while others receive unique IDs. Labels are then propagated through the graph, increasing coverage (≈95% of videos obtain keywords), correcting erroneous descriptions, and discovering niche clusters that lack a predefined keyword.
4. DSSM (Deep Structured Semantic Model) – a weakly supervised deep model originally for search, repurposed for recommendation. Positive samples are user clicks, negatives are exposures without clicks. The model incorporates bag‑of‑words, LSTM for context, and learns a 128‑dim semantic space where user vectors (Q) and video vectors (D) are matched via cosine similarity. DSSM improves AUC by ~1.3% over the previous pipeline, requiring only a few days of behavior data.
Empirical results show progressive AUC gains: TopicModel (+0.1%), Item2vec (+0.9%), DSSM (+1.3%). The final pipeline combines the four stages to balance simplicity, granularity, stability, and supervision.
Future directions:
Hierarchical clustering to capture multi‑level semantics (e.g., food → noodles, cake, hotpot).
Real‑time clustering to assign new videos to clusters as soon as minimal behavior is observed.
Further accuracy improvements by incorporating richer user and video portrait features into DSSM.
The presentation concluded with author information: Bai Yang, Senior Algorithm Engineer at Meitu Cloud Business Unit, with 4 years of ML experience, leading video clustering research for Meipai.
Meitu Technology
Curating Meitu's technical expertise, valuable case studies, and innovation insights. We deliver quality technical content to foster knowledge sharing between Meitu's tech team and outstanding developers worldwide.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.