Artificial Intelligence 19 min read

Advances in Information‑Flow Recommendation: Pre‑trained Models and Multimodal User‑Interface Modeling

This article reviews Huawei Noah's Ark Lab's work on modern information‑flow recommendation, covering the evolution from collaborative filtering to deep learning, the application of BERT‑based pre‑training for news ranking, multimodal user‑interface modeling, practical deployment challenges, and future research directions.

DataFunTalk

May 22, 2022

Advances in Information‑Flow Recommendation: Pre‑trained Models and Multimodal User‑Interface Modeling

The talk, presented by Zhu Jieming from Huawei Noah's Ark Lab and edited by Zhang Aoyu (AWS), introduces the rapid development of recommendation systems from early collaborative filtering to current deep‑learning‑centric approaches, highlighting the growing difficulty of optimizing large models under inference constraints.

Huawei's Noah's Ark Lab, comprising six sub‑labs (computer vision, speech, recommendation, search, decision reasoning, AI theory, and AI systems), conducts both fundamental AI research and product‑oriented technology empowerment, collaborating with over ten countries and 25 universities.

In the information‑flow recommendation scenario, Huawei applies its technology to diverse multimodal feeds such as phone home‑screen news, browser article/video waterfalls, and video‑app recommendations, emphasizing the shift toward multimodal, heterogeneous content.

The evolution of recommendation techniques is traced: early 2000s collaborative filtering, 2010s generalized linear models (FTRL, FM, BPR, RankSVM), and from 2015 onward deep learning models like YouTubeDNN, Wide&Deep, DeepFM, and DIN, with performance gains driven by larger datasets and GPU advances.

Since 2018, the pre‑training + fine‑tuning paradigm (e.g., BERT) from NLP and CV has been adopted to boost recommendation performance. The UNBERT model concatenates user‑history and candidate news, uses segment IDs and CLS token for token‑level matching, pools news vectors, and adds a transformer layer for sentence‑level similarity, trained on CTR data.

Experimental results on the Microsoft MIND dataset show that UNBERT and its improved version MINER achieve significant offline AUC gains over baselines (NRMS, NAML), and demonstrate better cold‑start generalization. To serve online, model size is reduced (BERT‑mini), knowledge distillation is explored, and embeddings are compressed to 50‑dimensional vectors.

For practical deployment, UNBERT is combined with a DCN‑based CTR model, caching news embeddings to meet latency requirements, and dimensionality reduction is performed via a learned fully‑connected layer rather than PCA.

Beyond text, a multimodal user‑interface modeling approach captures visual impressions of news cards (image layout, size, typography) using pretrained ResNet/CLIP features on patches and whole cards, integrating local and global impressions into the CTR model.

Offline experiments on MIND demonstrate that incorporating visual card representations yields notable AUC improvements (up to several percentage points) over text‑only baselines.

Deployment challenges include limited image data coverage, engineering effort for real‑time image processing, and the need to fuse multimodal embeddings efficiently. Future work focuses on faster pre‑training and fine‑tuning pipelines, embedding‑only adaptation, and better utilization of contextual visual information.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI BERT Multimodal Learning pretrained models news recommendation Huawei

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.