Video Recommendation System: Framework, Topic Clustering, and Related Video Retrieval
The paper proposes a video recommendation framework that combines recall and ranking modules, using a multi‑modal topic clustering approach—integrating audio, visual, and textual features via NeXtVLAD, PCA, and K‑Means—to generate unified video representations, improve candidate selection, and boost click‑through and viewing time, while addressing cold‑start and semantic relevance challenges.
This paper presents a comprehensive video recommendation system framework, focusing on topic clustering and related video retrieval. The system architecture consists of two core modules: recall and ranking. The recall module filters hundreds of millions of data to generate relevant candidates, while the ranking module performs precise sorting based on user profiles and contextual features.
The paper discusses traditional recall methods including behavior recall (collaborative filtering), semantic recall (using metadata like titles, tags, and categories), and visual recall (based on video content similarity). Each method has limitations, particularly in handling cold-start problems where new users or content lack sufficient interaction data.
The proposed solution introduces a multi-modal topic clustering approach that combines audio, video frame features, and text information. The system uses NeXtVLAD algorithm to aggregate frame-level features from different modalities, creating unified video-level representations. The model incorporates video titles through word vectorization and training on NetEase news data, enhancing semantic understanding.
The clustering process involves PCA dimensionality reduction followed by K-Means clustering, resulting in 3600+ topic categories covering 96% of videos in the recommendation pool. Related video retrieval is implemented by grouping videos within the same topic cluster, demonstrated through A/B testing showing improvements in click-through rates and viewing duration.
The paper identifies current challenges including visual similarity without semantic relevance and overly broad topic coverage. Future work includes incorporating more semantic information and implementing hierarchical clustering for finer topic granularity.
NetEase Media Technology Team
NetEase Media Technology Team
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.