Artificial Intelligence 18 min read

Tencent's Taiji Machine Learning Platform: End-to-End MLOps for Advertising

Tencent’s Taiji machine learning platform, a cloud‑native, distributed parameter‑server system, provides end‑to‑end MLOps for advertising by integrating data ingestion, feature engineering, model training, evaluation, deployment, and monitoring, supporting massive models up to billions of parameters while improving efficiency, scalability, and resource management.

Tencent Advertising Technology
Tencent Advertising Technology
Tencent Advertising Technology
Tencent's Taiji Machine Learning Platform: End-to-End MLOps for Advertising

Tencent has built the industry‑leading Taiji machine learning platform, a one‑stop solution that covers data import, feature engineering, model training, and online service, enabling users to focus on AI business problems rather than engineering challenges.

The platform adopts a distributed parameter‑server architecture (AngelPS) that separates parameter storage and computation, allowing it to support 10 TB‑scale model training, TB‑scale inference, and minute‑level model deployment, with GPU acceleration, multi‑level disaster recovery, and real‑time monitoring.

Since its inception in 2015, Taiji has evolved through several milestones: 1.0 launched as a full‑process ML platform; 2018 added deep‑learning acceleration; 2019 integrated with Tencent Cloud for multi‑environment support; 2020 transitioned to a cloud‑native architecture serving core AI business scenarios; 2022 introduced the Taiji Advertising One‑Stop Platform to streamline ad model iteration.

In advertising, Taiji implements MLOps by providing a unified lifecycle management product that handles feature data, model code, training environments, and deployment, reducing 60+ steps to under 7 and accelerating feature‑to‑model cycles (e.g., feature iteration time from 20 days to 5 days).

Key capabilities include:

Full‑lifecycle model management with security‑guaranteed development, packaging, and deployment.

Feature management offering registration, offline/real‑time ingestion, and versioned updates.

Model training module with an online IDE, long‑term seed model support, and visual task monitoring.

Model inference module with automated consistency checks, sandbox testing, and rapid A/B experimentation.

Workspace for cross‑team collaboration and customized resource allocation.

The platform also provides dynamic resource scheduling on Kubernetes, automatically assigning mixed‑resource pools for exploratory tasks and stable resources for production, employing an HBO optimizer for efficient resource utilization and failover capabilities.

Future directions focus on intelligent automation: enhancing feature exploration efficiency, building feature and model libraries, and integrating AutoML for automatic model optimization, further reducing manual effort and boosting productivity.

advertisingFeature EngineeringModel Deploymentmlopsdistributed trainingmachine learning platform
Tencent Advertising Technology
Written by

Tencent Advertising Technology

Official hub of Tencent Advertising Technology, sharing the team's latest cutting-edge achievements and advertising technology applications.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.