Artificial Intelligence 10 min read

Pairwise Ranking Factorization Machines (PRFM) for Feed Recommendation in Tencent Shield

The article presents Pairwise Ranking Factorization Machines (PRFM), a pairwise‑learning extension of Factorization Machines that replaces Tencent Shield’s pointwise binary‑classification pipeline, generates user‑item‑item triples, optimizes a cross‑entropy loss, and achieves about a 5% relative UV click‑through gain on the HandQ anime feed while outlining offline metrics, hyper‑parameter tuning, and future informed‑sampling enhancements.

Tencent Cloud Developer

Mar 16, 2018

Pairwise Ranking Factorization Machines (PRFM) for Feed Recommendation in Tencent Shield

Tencent Shield's open recommendation system usually casts recommendation problems as binary classification tasks, but for list recommendation scenarios the problem is closer to a ranking task. This article introduces the pairwise learning approach combined with recommendation algorithms, specifically the Pairwise Ranking Factorization Machines (PRFM) algorithm, and shares its application in the HandQ anime Feed recommendation scenario.

1. Overview

In the current Shield recommendation pipeline, the problem is formalized as a binary classification task: for each user‑item pair, a click is treated as a positive sample (label = 1) and a non‑click as a negative sample (label = 0). The model is trained to assign higher scores to all positive samples than to negative ones. This is known as the Pointwise method.

The Pointwise method has a clear limitation: it does not consider the relative ordering of items for the same user. For example, it is sufficient that the score of Dragon Ball for user X is higher than the score of Detective Conan , while the relationship between Dragon Ball and an item belonging to another user is irrelevant.

The Pairwise method addresses this shortcoming by constructing training triples <user, item1, item2>, where item1 is a clicked item and item2 is an unclicked item. The training objective is to ensure that the score of item1 is higher than that of item2 for the same user. This better reflects implicit feedback, where clicks indicate a relative preference rather than an absolute like/dislike.

The PRFM algorithm is a concrete implementation of the Pairwise approach that uses Factorization Machines (FM) as the scoring model. Compared with a Pointwise FM baseline, PRFM achieves roughly a 5% relative improvement in UV click‑through rate on the HandQ anime Feed.

2. PRFM Algorithm Details

FM is chosen because it reduces feature‑engineering effort compared with linear models and has been shown to outperform well‑tuned LR in production. For each user, many <item1, item2> pairs can be generated, but to keep the training set manageable we randomly sample 100 item pairs per user.

The algorithm consists of the following components:

Feature vector composed of user, item, and context features.

Scoring function based on FM.

Loss function defined as cross‑entropy over the pairwise samples, encouraging the score of the clicked item to be larger than that of the unclicked item.

The mathematical formulation (illustrated in the original figures) defines the loss for each triple <u, i, j> as the cross‑entropy between the predicted preference probability and the ground‑truth label that user u prefers item i over item j.

3. Offline Evaluation Metrics and Parameter Tuning

Unlike classification tasks that commonly use AUC, ranking performance is measured with metrics such as Precision@k, MAP (Mean Average Precision), and NDCG@k. The article explains each metric with illustrative examples.

Key hyper‑parameters of PRFM (identical to those of Pointwise FM) include the standard deviation of the Gaussian initialization ( init_std), L2 regularization coefficient ( reg), and latent factor dimension ( factor). Offline tuning on the HandQ Feed data led to the following optimal values: init_std = 0.005, reg = 0.0001, factor = 100.

4. Future Improvement Plans

The current sampling strategy selects 100 random item pairs per user. Research suggests that more informed sampling—e.g., ranking item pairs by the positional gap between the clicked and unclicked items in the exposure list and selecting the top‑gap pairs—can further boost model performance. Future work will explore various sampling strategies and share the findings.

For the full details and original figures, please refer to the source article.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

ranking Recommendation Systems factorization machines pairwise learning

Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.