Artificial Intelligence 12 min read

XGBoost Serving: An Open‑Source High‑Performance Inference System for GBDT and GBDT+FM Models

XGBoost Serving is an open‑source, high‑performance inference system built on TensorFlow Serving that adds dedicated servables for pure GBDT, GBDT+FM binary‑classification, and GBDT+FM multi‑classification models, providing automatic version lifecycle management, GRPC/HTTP APIs, and up to 50 % latency reduction, now available on GitHub after successful deployment in iQIYI’s recommendation platform.

iQIYI Technical Product Team
iQIYI Technical Product Team
iQIYI Technical Product Team
XGBoost Serving: An Open‑Source High‑Performance Inference System for GBDT and GBDT+FM Models

To fill the gap in the community for production‑ready inference of GBDT, GBDT+FM binary‑classification, and GBDT+FM multi‑classification models, iQIYI designed and developed a flexible, high‑performance inference system called XGBoost Serving, which has been deployed in multiple internal business scenarios and is now open‑sourced.

Background: In 2014 Facebook introduced the GBDT+LR (Logistic Regression) model, which combines GBDT‑generated features with LR and yields at least a 3% improvement over using either model alone. Modern recommendation and advertising systems rely on high‑dimensional sparse features (e.g., video IDs). Because GBDT does not natively support such sparse features, GBDT+LR requires manual feature crossing in the LR part, increasing computational complexity. iQIYI replaces the LR component with Factorization Machines (FM), which automatically handle high‑dimensional sparse features and second‑order interactions, achieving at least a 4% gain over GBDT+LR.

Existing serving solutions such as TensorFlow Serving only support TensorFlow models and have no roadmap for other frameworks; other servers (zoltar, KFServing XGBoost Server) support only single‑model or single‑version deployment, lack lifecycle management, and suffer from poor C++ performance. Therefore, iQIYI built XGBoost Serving on top of TensorFlow Serving, adding three new servables: XGBoost Servable, alphaFM Servable, and alphaFM_softmax Servable, to support pure GBDT, GBDT+FM binary‑classification, and GBDT+FM multi‑classification inference respectively.

The project is open‑sourced on GitHub at https://github.com/iqiyi/xgboost-serving .

Deployment in iQIYI’s recommendation platform: After training a GBDT+FM binary‑classification model, a version number is assigned and the model files are placed in a configured directory. XGBoost Serving detects the new version, loads the appropriate servables, and automatically starts GRPC, HTTP, and Metrics services. Model updates are performed by adding a new versioned directory; the system automatically manages the lifecycle. This architecture eliminates manual model lifecycle management and reduces P99 tail latency by at least 50% under the same load compared with in‑engine deployment.

System architecture: TensorFlow Serving’s core concepts—Servables, Servable Versions, Loaders, Sources, Aspired Versions, and Managers—are introduced. XGBoost Serving extends these concepts by implementing new servables for GBDT and FM models. The architecture is organized into five layers: Source (polls file system for new versions), Servables (XGBoost, alphaFM, alphaFM_softmax), AspiredVersionsManager (decides which versions to load/unload using AvailabilityPreservingPolicy), Predictors (XGBoost Predictor, alphaFM Predictor, alphaFM_softmax Predictor), and Servers (GRPC, HTTP, Metrics). Each layer’s responsibilities and interactions are described in detail.

Servable details: XGBoost Servable handles pure GBDT models exported by XGBoost; alphaFM Servable supports GBDT+FM binary‑classification models, requiring a GBDT model, an FM binary‑classification model, and a FeatureMapping file; alphaFM_softmax Servable supports GBDT+FM multi‑classification models, requiring analogous components.

Predictor details: XGBoost Predictor computes leaf indices and values; alphaFM Predictor combines leaf indices (converted to feature IDs) with FM binary‑classification inference; alphaFM_softmax Predictor performs a similar pipeline for multi‑class FM inference.

Server details: GRPC Server provides online inference APIs; HTTP Server offers model management endpoints (e.g., status queries); Metrics Server exposes monitoring data such as QPS and latency distributions.

Summary : This article presented the background, practice, features, architecture, and implementation of XGBoost Serving, an open‑source inference system that bridges the community’s gap for production‑grade GBDT and GBDT+FM model serving. It has been successfully applied in iQIYI’s internal services and is now available on GitHub for the broader community.

GBDTmachine learningOpen Sourcemodel inferenceFactorization MachinesServing ArchitectureXGBoost Serving
iQIYI Technical Product Team
Written by

iQIYI Technical Product Team

The technical product team of iQIYI

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.