Artificial Intelligence 9 min read

Introduction to User Behavior and Collaborative Filtering in Recommendation Systems

This article explains user behavior concepts and feedback types, introduces collaborative filtering methods including user‑based, item‑based and latent factor models, discusses similarity measures, power‑law distributions, and practical considerations such as negative sampling, providing a comprehensive overview for building recommendation systems.

Architecture Digest
Architecture Digest
Architecture Digest
Introduction to User Behavior and Collaborative Filtering in Recommendation Systems

User Behavior Introduction

Collaborative filtering, known in academia as the collaborative filtering algorithm, relies on user behavior for recommendations. Explicit feedback (e.g., ratings, likes/dislikes) and implicit feedback (e.g., page views) are the two main categories. Feedback can also be classified as positive or negative.

Representing diverse online behaviors—such as browsing, purchasing, commenting, and rating—in a unified way is challenging; a possible representation is illustrated in the accompanying diagram.

User Behavior Analysis

Two variables are defined:

User activity: total number of items a user has interacted with.

Item popularity: total number of users who have interacted with an item.

Both user activity and item popularity follow a Power Law (long‑tail) distribution. More active users tend to browse niche items.

Recommendation algorithms based solely on user behavior are generally called collaborative filtering. Research has produced various approaches, such as neighborhood‑based methods, latent factor models, and graph‑based random walk algorithms.

Neighborhood‑Based Algorithms

Neighborhood methods are divided into two major categories:

User‑based collaborative filtering: recommends items liked by users with similar interests.

Item‑based collaborative filtering: recommends items similar to those the target user already likes.

User‑Based Collaborative Filtering

The process involves two steps:

Identify a set of users whose interests are similar to the target user.

Recommend items liked by this set that the target user has not yet encountered.

Similarity can be measured using Euclidean distance, Pearson correlation, Cosine similarity, or Tanimoto coefficient, each affecting results differently.

Item‑Based Collaborative Filtering

Item‑based collaborative filtering evaluates similarity between items based on user ratings, then recommends items similar to those the user previously liked.

Comparison of UserCF and ItemCF

In e‑commerce, the number of users usually far exceeds the number of items, making ItemCF computationally cheaper. For non‑social sites, content similarity is a stronger recommendation signal than user similarity. In social networks, UserCF (especially when combined with social information) can provide more explainable recommendations.

Latent Factor Model (LFM)

The latent semantic model originated in text mining (LSI, pLSA, LDA, Topic Model) and was later adapted for recommendation via matrix factorization. Traditional SVD is computationally intensive for large datasets; Funk‑SVD (also called Latent Factor Model) improves scalability.

Matrix factorization represents the user‑item interaction matrix R as the product of two lower‑dimensional matrices: P (user‑topic) and Q (topic‑item). Each entry R ij denotes user i 's interest in item j . Missing values can be initialized with the average rating.

When the matrix is large, SVD becomes slow, so gradient descent is used to learn P and Q . The update rules (shown in the diagrams) involve three hyper‑parameters:

Number of latent factors F .

Learning rate alpha .

Regularization parameter lambda .

Since only positive interactions are observed (implicit feedback), negative samples ( R ij =0 ) must be generated. The sampling strategy ensures a balance of positive and negative samples per user and selects popular items that the user has not interacted with, because obscure items may simply be undiscovered.

Summary

This article introduced the fundamentals of user behavior, explicit/implicit feedback, and positive/negative feedback. It then described two major families of recommendation algorithms: neighborhood‑based methods and latent semantic models. The next article will demonstrate how to apply these algorithms using the Surprise library.

References

Using LFM (Latent Factor Model) for Top‑N recommendation: http://blog.csdn.net/harryhuang1990/article/details/9924377

Recommendation System Practice

Source: https://www.zybuluo.com/zhuanxu/note/985025
machine learninguser behaviorcollaborative filteringrecommendation systemslatent factor model
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.