Artificial Intelligence 10 min read

Pairwise Ranking Factorization Machines (PRFM) for Feed Recommendation in Tencent Shield

The article presents Pairwise Ranking Factorization Machines (PRFM), a pairwise‑learning extension of Factorization Machines that replaces Tencent Shield’s pointwise binary‑classification pipeline, generates user‑item‑item triples, optimizes a cross‑entropy loss, and achieves about a 5% relative UV click‑through gain on the HandQ anime feed while outlining offline metrics, hyper‑parameter tuning, and future informed‑sampling enhancements.

Tencent Cloud Developer
Tencent Cloud Developer
Tencent Cloud Developer
Pairwise Ranking Factorization Machines (PRFM) for Feed Recommendation in Tencent Shield

Tencent Shield's open recommendation system usually casts recommendation problems as binary classification tasks, but for list recommendation scenarios the problem is closer to a ranking task. This article introduces the pairwise learning approach combined with recommendation algorithms, specifically the Pairwise Ranking Factorization Machines (PRFM) algorithm, and shares its application in the HandQ anime Feed recommendation scenario.

1. Overview

In the current Shield recommendation pipeline, the problem is formalized as a binary classification task: for each user‑item pair, a click is treated as a positive sample (label = 1) and a non‑click as a negative sample (label = 0). The model is trained to assign higher scores to all positive samples than to negative ones. This is known as the Pointwise method.

The Pointwise method has a clear limitation: it does not consider the relative ordering of items for the same user. For example, it is sufficient that the score of Dragon Ball for user X is higher than the score of Detective Conan , while the relationship between Dragon Ball and an item belonging to another user is irrelevant.

The Pairwise method addresses this shortcoming by constructing training triples <user, item 1 , item 2 > , where item 1 is a clicked item and item 2 is an unclicked item. The training objective is to ensure that the score of item 1 is higher than that of item 2 for the same user. This better reflects implicit feedback, where clicks indicate a relative preference rather than an absolute like/dislike.

The PRFM algorithm is a concrete implementation of the Pairwise approach that uses Factorization Machines (FM) as the scoring model. Compared with a Pointwise FM baseline, PRFM achieves roughly a 5% relative improvement in UV click‑through rate on the HandQ anime Feed.

2. PRFM Algorithm Details

FM is chosen because it reduces feature‑engineering effort compared with linear models and has been shown to outperform well‑tuned LR in production. For each user, many <item 1 , item 2 > pairs can be generated, but to keep the training set manageable we randomly sample 100 item pairs per user.

The algorithm consists of the following components:

Feature vector composed of user, item, and context features.

Scoring function based on FM.

Loss function defined as cross‑entropy over the pairwise samples, encouraging the score of the clicked item to be larger than that of the unclicked item.

The mathematical formulation (illustrated in the original figures) defines the loss for each triple <u, i, j> as the cross‑entropy between the predicted preference probability and the ground‑truth label that user u prefers item i over item j .

3. Offline Evaluation Metrics and Parameter Tuning

Unlike classification tasks that commonly use AUC, ranking performance is measured with metrics such as Precision@k, MAP (Mean Average Precision), and NDCG@k. The article explains each metric with illustrative examples.

Key hyper‑parameters of PRFM (identical to those of Pointwise FM) include the standard deviation of the Gaussian initialization ( init_std ), L2 regularization coefficient ( reg ), and latent factor dimension ( factor ). Offline tuning on the HandQ Feed data led to the following optimal values: init_std = 0.005 , reg = 0.0001 , factor = 100 .

4. Future Improvement Plans

The current sampling strategy selects 100 random item pairs per user. Research suggests that more informed sampling—e.g., ranking item pairs by the positional gap between the clicked and unclicked items in the exposure list and selecting the top‑gap pairs—can further boost model performance. Future work will explore various sampling strategies and share the findings.

For the full details and original figures, please refer to the source article.

Rankingrecommendation systemsmachine learningFactorization Machinespairwise learning
Tencent Cloud Developer
Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.