Artificial Intelligence 16 min read

Personalized Recommendation System Architecture and Techniques at Youzan

Youzan’s personalized recommendation platform combines a four‑layer architecture—data, storage, service, and application—with multi‑dimensional real‑time, offline, and cold‑start recall algorithms, Wide&Deep ranking, HBase/Druid storage, and configurable scene strategies to boost user conversion, traffic monetization, and future scalability.

Youzan Coder
Youzan Coder
Youzan Coder
Personalized Recommendation System Architecture and Techniques at Youzan

Personalized recommendation has become a core business intelligence platform built on massive data mining, helping e‑commerce sites provide fully personalized decision support and information services. Youzan’s micro‑mall uses a personalized recommendation system with scenario‑based entry points to improve user conversion and maximize traffic monetization.

1. Scenario Introduction

The system integrates recommendation entry points on seven fixed pages (product detail, cart, order list, logistics, etc.) and also provides plugins for shop decoration (micro‑pages, personal center, calendar sign‑in) and activity pages (bargain, flash sale, reward for good reviews).

2. Overall Architecture

The Youzan personalized recommendation system consists of four layers: data, storage, service, and application.

Data layer: offline and real‑time parts. Offline generates recall item sets based on three dimensions and integrates DMP profile data for model building. Real‑time recall uses users’ real‑time behavior.

Storage layer: offline recall data is stored in HBase, real‑time data in Druid, with Redis cache for performance.

Service layer: provides external APIs. According to scene‑configured recall strategies, it performs recall, ranking, and real‑time filtering.

Application layer: directly calls the service layer to obtain recommendation results and display products.

3. Recall

Recall is performed in the data layer by analyzing multi‑dimensional user behaviors (browse, add‑to‑cart, purchase, search, consult) to produce various recall sources for real‑time, offline, and cold‑start recall.

3.1 Real‑time Recall

Similar items are fetched from an offline similarity table based on real‑time browsing and consulting behavior, scored, and sorted for recommendation. Ongoing research includes additional models and richer behavior dimensions such as favorites and add‑to‑cart.

3.2 Offline Recall

3.2.1 CF (Collaborative Filtering)

Item‑CF computes similarity scores between items based on co‑occurrence in browsing, add‑to‑cart, or purchase events, generating an item similarity table. User‑CF computes similarity between users based on shared item interactions, enabling recommendation of items liked by similar users.

Effective offline recall requires tuning time windows, decay factors, normalization, and sharding strategies.

3.2.2 ClassPreference

Recalls items by analyzing users’ category preferences across the entire platform and combining them with hot‑selling items in those categories.

3.2.3 FP‑Growth

Uses the FP‑Growth algorithm to mine frequent itemsets (e.g., basket analysis) from browsing and add‑to‑cart sessions, identifying items frequently viewed or added together.

3.2.4 Produce&Word2Vec

Embeds product titles with Word2Vec, computes cosine similarity, and narrows the candidate set using product‑level words to focus on semantically similar items.

3.2.5 QueryWords

Links search queries with clicked items, builds a query‑item relationship, and recommends items based on users’ query preferences, while carefully managing the time window of query behavior.

3.2.6 Seq&Word2Vec

Analyzes user behavior sequences within a time window, constructs directed graphs of item transitions, performs random walks to generate item sequences, and feeds them to Word2Vec to obtain item vectors for similarity calculation.

3.2.7 Others

Additional recall algorithms consider geographic, age, and gender dimensions.

3.3 Cold‑Start Recall

3.3.1 shopHot

Recalls items based on shop‑level hot‑selling products, with category‑level diversification to improve recommendation diversity.

3.3.2 linUCB

Applies the linear Upper Confidence Bound (linUCB) algorithm to address cold‑start for new items. It models the expected reward of an item as a linear function of user and item features, selecting items with the highest upper confidence bound and updating parameters iteratively.

4. Ranking

The ranking module currently runs two main models: a baseline voting‑style model that sums scores from multiple recall sources, and a Wide&Deep model.

Wide&Deep combines wide (memorization) features for deterministic recommendations and deep (generalization) features for modeling sparse and implicit behaviors. The final layer merges Wide and Deep features and feeds them into a logistic regression model.

Reference: "Wide & Deep Learning for Recommender Systems".

5. Storage Table Design

Offline recall results are stored in HBase with three types of RowKey designs for user‑item, item‑item, and shop‑item mappings, enabling efficient management of recall outputs with a small number of tables.

Each table receives parallel imports from different recall algorithms to avoid a single algorithm failure causing a full re‑import.

6. Scenario Strategy Configuration

Each recommendation scene configures real‑time, offline, and cold‑start recall strategies. For example, the product detail page uses similarity‑based real‑time recall, offline frequent‑item mining, and shop‑hot cold‑start.

上述举例对应配置格式如下(仅供参考):
{
实时:
Item-CF
离线:
离线:
FPGrowth
冷启动:
冷启动:
ShopHot
}

Additional split‑testing strategies based on unique user IDs and configurable ranking algorithm parameters (feature names, version, scene, split) are also supported.

7. Current Status and Outlook

Split‑testing strategies need more sophisticated designs as recommendation scenarios grow.

Ranking stage will explore more cutting‑edge algorithms and business‑driven adjustments.

Future plans include extending the system to other Youzan businesses such as retail, selection, and distribution.

Performance improvements are targeted across all stages: faster offline recall generation, larger recall candidate sets, and quicker ranking response.

The article does not delve into detailed technical implementations; interested readers are encouraged to follow subsequent posts.

Extended Reading

Best Practices of Data Platform at Youzan

Metadata System Practice in Youzan Data Warehouse

How We Redesigned the NSQ – Other Features and Future Plans

big dataMachine LearningpersonalizationrecommendationHBaseCold Startwide & deep
Youzan Coder
Written by

Youzan Coder

Official Youzan tech channel, delivering technical insights and occasional daily updates from the Youzan tech team.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.