Artificial Intelligence 25 min read

Decentralized Distribution in Xiaohongshu Recommendation System: Sideinfo, Multi‑modal Fusion, Interest Exploration and Future Directions

This article presents Xiaohongshu's technical solutions for decentralized content distribution, covering the definition of the problem, fast and accurate learning, side‑information modeling, graph‑based multi‑modal fusion, interest exploration and protection, and future research directions such as generative recommendation and large‑model driven user profiling.

DataFunTalk

Jul 31, 2024

Decentralized Distribution in Xiaohongshu Recommendation System: Sideinfo, Multi‑modal Fusion, Interest Exploration and Future Directions

Xiaohongshu, a rapidly growing UGC community with billions of daily exposures, faces a decentralization challenge: the most exposed and active content dominates distribution, leaving long‑tail creators and niche interests under‑served.

Problem definition : Decentralized distribution must improve both the speed (learning fast) and quality (learning well) of the recommendation pipeline, ensuring new notes and emerging user interests are captured early.

Core issues :

Learning fast: the system must quickly detect long‑tail content and user interests to avoid cold‑start loss.

Learning well: each stage (recall, coarse‑ranking, fine‑ranking, post‑ranking) must faithfully expose long‑tail signals rather than over‑optimizing for popular items.

Solution 1 – Strengthening side‑information usage :

Sideinfo is decoupled from note‑id embeddings and modeled with separate sequence encoders; attention scores for ID and sideinfo are computed independently and then combined with a residual connection, boosting sideinfo influence in both recall and ranking.

Solution 2 – Graph model fusion of sideinfo :

CF edges connect notes, sideinfo edges connect notes to their sideinfo, and swing algorithm mitigates exposure bias. Causal‑inference based back‑door adjustment reduces popularity bias of super‑nodes, while multiple meta‑paths with dedicated MLPs preserve diverse graph walks. Real‑time swing computation is achieved via Flink streaming.

Solution 3 – Full‑link multi‑modal signal fusion :

Multi‑modal embeddings (text + image) are aligned internally and with item towers (domain‑intra and cross‑domain alignment). The AlignRec framework integrates multi‑modal encodings at recall, coarse‑ranking, fine‑ranking and post‑ranking, using feature crossing, attention‑based similarity search, and graph‑based aggregation to improve long‑tail coverage.

Solution 4 – Interest exploration and link protection :

Explicit multi‑interest modeling builds a global interest pool and selects top‑k interests for each user, replacing implicit clustering. Evolution‑Strategy based exploration adds Gaussian noise to recall vectors and re‑injects high‑rank but un‑exposed items. Multi‑queue protection splits the recall‑to‑ranking pipeline into independent queues, each with custom ranking and auto‑tuned cut‑offs.

Solution 5 – Large‑model driven potential interest mining :

Using a proprietary 7B model (tomato‑7B) with chain‑of‑thought prompts, user profiles are generated offline, mapped to tag space via SIMCSE, and the resulting tags boost relevant notes in the post‑ranking stage.

Future outlook :

Generative recommendation for higher diversity.

Interactive multi‑turn search powered by large models.

Multi‑modal user profiling and end‑to‑end joint training.

Global signal modeling via full‑app graph and domain‑wide pre‑training.

The presentation concludes with an invitation to further discuss these techniques and follow Xiaohongshu’s technical community.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

recommendation decentralized-distribution interest-exploration sideinfo multi-modal fusion

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.