Cold Start Strategies for New Content in Baidu Feed Recommendation
This article presents Baidu's comprehensive approach to cold‑starting new content in its large‑scale feed recommendation system, covering the definition and challenges of content cold start, algorithmic practices, ID feature optimizations, traffic control mechanisms, experimental design, and key Q&A insights.
New resource cold start is a crucial problem in Baidu's feed recommendation ecosystem; the article shares Baidu's practical experience in addressing this challenge.
The concept and challenges include the scarcity of ID features for new items, the Matthew effect that favors already popular content, and the need to balance fairness (equal exposure) with merit (quality‑driven weighting).
Algorithmic practices focus on content‑based recall (using user profiles, tags, and classifications), author‑based look‑alike methods, multimodal CLIP embeddings with projection networks to map prior representations to behavior‑driven embeddings, and pairwise learning for mapping.
The experimental system discusses ID feature optimization paradigms—ID dropout (e.g., DropoutNet), contrastive learning, and ID generation via embedding initialization or generative models—along with traffic control mechanisms that allocate base and boost traffic, using time and distribution targets to steer exposure.
Cold‑start traffic allocation also considers user selection, preferring experienced users or key opinion leaders identified through quality‑based labeling or collaborative‑filtering success rates, to maximize downstream propagation.
Experiment design uses content isolation to accurately measure cold‑start impact, evaluating metrics such as CTR, completion rate, quality ratio, and author activity, while the Q&A section addresses practical concerns about tower classification, resource potential assessment, exposure timing, and experimental validity.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.