Artificial Intelligence 28 min read

Overview of Recommendation System Architecture, Algorithms, and Evaluation

This article provides a comprehensive introduction to recommendation systems, covering their definition, overall offline and online architectures, feature engineering, collaborative filtering, latent semantic models, ranking algorithms, and evaluation methods including A/B testing and offline metrics.

Architecture Digest
Architecture Digest
Architecture Digest
Overview of Recommendation System Architecture, Algorithms, and Evaluation

1. What is a Recommendation System

Recommendation systems provide personalized product information and suggestions to users of e‑commerce platforms, helping them decide what to purchase. As the number of items grows, users face information overload, leading to churn; personalized recommendation systems address this problem by leveraging massive data mining to deliver individualized decision support.

2. Overall Architecture

Recommendation systems can be built using offline training, online training, or a hybrid of both. The choice depends on the required response speed.

2.1 Offline Recommendation

Offline training uses historical data (e.g., weeks) to model long‑term user interests. A typical offline architecture consists of data collection, offline training, online storage, real‑time computation, and A/B testing. Data collection gathers raw business data, which is cleaned and transformed into training samples. Offline training performs sampling, feature engineering, model training, and similarity computation. The trained model and its features are stored in an online storage module for real‑time inference.

Data Collection : collects, validates, cleans, and converts raw data into training samples for offline storage.

Offline Training : uses distributed storage and computation to perform sampling, feature engineering, model training, and similarity calculation.

Online Storage : stores the model and feature data for low‑latency serving (typically within tens of milliseconds).

Real‑time Recommendation : retrieves user features, invokes the model, and ranks results. Usually a two‑step process of recall (selecting a candidate set from millions of items) followed by ranking.

A/B Testing : evaluates new algorithms against the baseline in production.

After recall and ranking, the API returns the final recommendation list.

2.2 Online Training

Online training continuously updates the model with real‑time feedback (clicks, likes, etc.), making it suitable for high‑dimensional, large‑scale, low‑latency scenarios such as advertising. It processes samples, features, and model updates in a streaming fashion, allowing the system to scale without massive offline storage.

Sample Processing : real‑time deduplication, filtering, and sampling of incoming data.

Real‑time Feature Construction : concatenates features for streaming training.

Streaming Training : incrementally updates the model, optionally initializing from an offline model.

Model Storage and Loading : models are stored in a parameter server and loaded dynamically by the online service.

3. Feature Data

Training a recommendation model requires constructing user feature vectors from behavior data. Important considerations include:

Types of user actions (view, click, purchase, etc.) and their relative importance.

Recency of actions (recent behavior carries more weight).

Frequency of actions (multiple interactions indicate stronger interest).

Item popularity (over‑popular items may be down‑weighted).

Data denoising (removing fraudulent or missing data).

Balancing positive and negative samples.

Feature combination (e.g., count, ratio, statistical features).

4. Collaborative Filtering Algorithms

Collaborative filtering, introduced in 1992, can be item‑based (ItemCF) or user‑based (UserCF), producing a TOP‑N recommendation list.

4.1 Item‑Based Collaborative Filtering

ItemCF recommends items similar to those a user has liked. Similarity can be computed by co‑occurrence, cosine similarity, or popularity penalty. Example formula for co‑occurrence similarity:

4.2 User‑Based Collaborative Filtering

UserCF computes similarity between users based on shared item interactions, then recommends items from the most similar users.

4.3 Matrix Factorization

Matrix factorization (e.g., SVD) reduces the high‑dimensional similarity matrix to latent factors, enabling more compact representations.

5. Latent Semantic Models

Latent factor models (LFM) map users and items into a shared latent space, capturing hidden interests. Common models include LSA, LDA, topic models, and various matrix factorization techniques.

6. Ranking Algorithms

After recall, ranking orders the candidate items. Common models include Logistic Regression, GBDT, GBDT+LR, GBDT+FM, and DNN+GBDT+FM. Example of Logistic Regression input:

[3, 1] (user features) + [4, 0] (item features) + [0, 1, 1] (context) → [3, 1, 4, 0, 0, 1, 1] .

GBDT builds an ensemble of decision trees; GBDT+FM replaces the LR layer with Factorization Machines to handle high‑dimensional sparse features. DNN+GBDT+FM further adds deep neural networks for richer representation.

7. Evaluation and Testing

7.1 A/B Testing

New models are compared against the baseline using online A/B tests, which randomly split users into groups and measure metrics such as click‑through rate, dwell time, and revenue.

7.2 Offline Metrics

Before online testing, offline metrics like accuracy, coverage, diversity, novelty, and user‑centric (UC) scores are computed.

8. Cold‑Start Problem

Cold start occurs when there is insufficient data for new users, new items, or a brand‑new system. Solutions include leveraging account information, device identifiers, explicit user preferences, item content, expert annotations, and hybrid strategies.

9. References

Recommendation System Practice – Xiang Liang (https://www.lanzous.com/i6362bi)

Recommendation Systems and Deep Learning (https://www.jianguoyun.com/p/DS32RkwQj4G5BxjCte4B)

Meituan Machine Learning Practice (https://www.jianguoyun.com/p/DQYaGLAQj4G5BxjFte4B)

machine learningfeature engineeringrecommendation systemRankingA/B Testingcollaborative filteringonline training
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.