Migrating Youku Tudou Video Recommendation System from Offline to Online Sorting
The article details how Youku Tudou redesigned its video recommendation architecture, moving ranking from offline to online processing, outlining the comparative architecture, benefits, challenges, feature handling, offline evaluation methods, and weight‑fusion techniques that enabled a successful launch after two months of development.
Youku Tudou, with billions of videos and playback counts, relies on a recommendation system to deliver high‑quality video suggestions; to improve recommendation effectiveness and reduce the cost of testing new algorithms, the team decided to migrate the system from an offline‑sorting architecture to an online‑sorting one.
The new architecture shifts the ranking stage to the online layer, requiring adjustments to data preparation, storage, API encapsulation, and real‑time feature handling, as illustrated in the migration diagram.
Advantages of the online‑sorting architecture include:
Minimal additional development effort for AB testing different models, without storing a separate sorted recommendation list.
Separation of ranking data from specific models, allowing lightweight traffic‑strategy adjustments by selecting the model at request time.
Ability to leverage features that are only available in the online environment, enabling one‑pass ranking of candidate videos.
Challenges introduced by the online approach are:
APIs must sustain a much higher volume of real‑time data requests (candidate video sets, detailed feature values, model weights, etc.).
The online logic must compute ranking results within a very short latency.
Stringent code and service performance requirements to avoid degrading user experience.
Key learnings from the migration:
1. Feature name handling: Different training dates may produce varying feature sets; the new system encodes all feature names globally with unique identifiers, allowing seamless addition of new features.
2. Model effectiveness evaluation: Before launching AB tests, the team performs offline AUC evaluations using a day of data for training and another day for testing; if the experimental model’s global AUC surpasses the control, it is likely to outperform in online small‑traffic tests.
3. Model weight fusion: By merging feature weights from models trained on different dates, the system expands the effective feature set, mitigating cases where online candidate videos lack features present in the latest model and improving click‑through performance.
After more than two months of intensive development and testing, the new online‑sorting recommendation system was successfully launched, with ongoing plans to explore further optimizations for an even better user viewing experience.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.