Artificial Intelligence 13 min read

Evolution and Experience of iQIYI's Machine Learning Platform

iQIYI’s Machine Learning Platform evolved from the specialized Javis deep‑learning system into a unified, low‑threshold solution for algorithm engineers, analysts, and developers, adding visual pipeline building, multi‑framework scheduling, automatic hyper‑parameter tuning, parameter‑server training, and scalable online prediction, dramatically boosting business efficiency and detection performance.

iQIYI Technical Product Team
iQIYI Technical Product Team
iQIYI Technical Product Team
Evolution and Experience of iQIYI's Machine Learning Platform

Before building a dedicated machine learning platform, iQIYI already had a mature deep learning platform called Javis, which was targeted at advanced algorithm engineers and required submitting code to a specialized compute cluster, resulting in a high usage threshold.

Beyond deep learning, many smaller business units needed support for machine learning, data mining, and data analysis, which Javis did not provide. This led to the need for an independent algorithm and engineering platform to serve these diverse use cases.

The platform aims to serve algorithm engineers, data analysts, and business development engineers by providing efficient offline and real‑time prediction services, lowering the cost of using machine learning, and improving the efficiency of algorithm integration while leveraging the data‑center capabilities for model sharing and standardization.

Version 1.0 focused on solving the “smoke‑stack” problem where each business built its own algorithm platform. It introduced asynchronous distributed scheduling of algorithms via Spark ML, significantly improving algorithm integration efficiency.

Version 2.0 added a visual front‑end that allowed users to drag‑and‑drop components to build machine learning pipelines, introduced an independent scheduling service with task monitoring and automatic retry, and decoupled the execution engine from any specific algorithm framework, enabling support for Spark ML, XGBoost, and graph algorithms. It also integrated with iQIYI’s big‑data platform Babel and the Gear scheduling service.

Version 3.0 completed the functional roadmap by providing online prediction services, automatic hyper‑parameter tuning, a parameter‑server for large‑scale model training, and an API layer for external platform integration. The platform now offers an end‑to‑end workflow from feature engineering to model training, evaluation, and both offline and online prediction.

The platform also includes an automatic hyper‑parameter tuning system that supports multi‑round iterative optimization, works across Spark, Python, and custom frameworks, and incorporates algorithms such as random search, grid search, Bayesian optimization, and a proprietary genetic algorithm.

To handle data‑scale challenges, the platform adopted a parameter‑server architecture (Tencent Angel) to improve training efficiency for datasets ranging from tens of records to billions, reducing network overhead and achieving more than 50% speedup for large‑scale models compared to pure Spark ML.

Model management is achieved through custom model files and PMML, enabling a unified prediction component that can load models from different frameworks and include preprocessing logic, thus simplifying online prediction deployment.

The online prediction system supports both local (jar‑packaged) and cloud (Docker‑based) deployment modes, offering HTTP and RPC (Dubbo) interfaces, with push and pull strategies for model updates to cover all update scenarios.

A practical case study is the anti‑cheat business, which processes tens of millions of logs daily and achieves peak online prediction throughput of tens of thousands of queries per second, improving detection efficiency by over 80%.

machine learningplatform engineeringAIBig Dataauto-tuningonline prediction
iQIYI Technical Product Team
Written by

iQIYI Technical Product Team

The technical product team of iQIYI

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.