Artificial Intelligence 14 min read

News Recommendation Algorithms: Architecture, Recall, and Ranking Techniques

This article explains the architecture of news recommendation systems, detailing the two-stage recall and ranking process, various recall methods such as content‑based, collaborative filtering and matrix factorization, and advanced ranking models including LR, GBDT, FM, and wide‑and‑deep DNNs.

Sohu Tech Products
Sohu Tech Products
Sohu Tech Products
News Recommendation Algorithms: Architecture, Recall, and Ranking Techniques

In the mobile internet era, users face an explosion of information and need recommendation systems to quickly find content of interest; this article introduces the principles behind news recommendation algorithms.

01. News Recommendation Algorithm Architecture

The core of news recommendation consists of two stages: a retrieval (recall) stage and a ranking stage. The recall stage must handle millions of articles with low latency, while the ranking stage applies more resource‑intensive models to a much smaller candidate set.

Recall Stage – Based on users' long‑term and short‑term behavior, a small candidate set (hundreds to thousands of articles) is selected from a massive pool.

Ranking Stage – Precise personalized scoring is performed on the recall set, and the top‑scoring articles are presented to the user.

The system includes several key components:

User profile: demographic attributes, historical and short‑term behavior, interests, and preferences.

Feature engineering: article category, topics, keywords, content analysis, and statistical features.

Recall algorithms: collaborative filtering (item‑CF, user‑CF), topic models, content‑based recall, matrix factorization, etc.

Ranking algorithms: models such as LR, GBDT, FM, and various DNNs.

Rerank step: ensures diversity, freshness, surprise, and incorporates product logic.

02. Recall Algorithms

Content‑Based Recall – Uses semantic or lexical features of articles (keywords, topics, categories, source) to retrieve relevant items.

Collaborative Filtering Recall

• User‑CF: finds similar users and recommends items they liked.

• Item‑CF: computes similarity between items based on co‑click behavior and recommends similar items.

While effective, CF requires large amounts of interaction data and offers limited interpretability.

Matrix Factorization Recall – Decomposes the sparse user‑item rating matrix into low‑dimensional user and item vectors (U and V). The process includes preparing the rating matrix, random initialization, prediction, error calculation, gradient descent updates, and iteration until convergence. The resulting vectors are used for fast inner‑product recall.

Common algorithms: SVD, SVD++, timeSVD++, SparkALS.

Hotspot Recall – Serves as a fallback, using metrics such as Wilson confidence interval, Bayesian smoothing, and time decay to rank popular items.

Features for Ranking Models

Relevance features: semantic or lexical matches (category, topic, keyword, source).

Context features: user location, time, device, network status.

Popularity features: article CTR, category hotness, topic hotness.

Mining features: collaborative, clustering, similarity‑based signals.

Ranking Models

LR (Logistic Regression): simple linear model requiring well‑engineered features.

GBDT (Gradient Boosted Decision Trees): captures non‑linear feature interactions automatically.

FM & FFM (Factorization Machines and Field‑aware FM): model pairwise feature interactions; FFM adds field awareness for richer representations.

DNN (Deep Neural Network): captures complex patterns but may over‑generalize for sparse users.

Wide & Deep: combines a linear “wide” part (memorization) with a deep neural part (generalization) to leverage both strengths.

The wide component uses sparse and crossed features (e.g., article category ID, topic ID, exposure position), while the deep component embeds both discrete and continuous features (UserID, DocID, location, keyword IDs, statistical features) into dense vectors for fusion.

Summary

The article uses news recommendation as a case study, first describing the two‑stage architecture of recall and ranking, then detailing various recall methods and common ranking models. In practice, ranking models are moving toward deep learning, but successful systems still require careful feature engineering, multi‑strategy recall, diversity handling, cold‑start solutions, and multi‑objective optimization to meet personalized user needs.

machine learningfeature engineeringDeep LearningRankingrecallcollaborative filteringnews recommendation
Sohu Tech Products
Written by

Sohu Tech Products

A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.